Page 1 of 1

TiMOS R16 rebooting (Fatal Error: Core0 DEAD)

Posted: Fri Oct 30, 2020 2:28 pm
by ablongo
Hi Guys!

My lab (TiMOS R16) is facing rebooting events everytime when using 3 or more nodes up/running on the Lab, the nodes start rebooting with "Fatal Error: Core0 Dead" after a short time period. Do you guys know how to solve this issue ?
See more details below.

### My MiniPC config ( Intel i9 / 64G RAM / 1Tb SSD / Ubuntu 20 )
EVE-NG Pro running on VMware Worksation
---- VMware Application Settings ( Memory Allocation: 56224 MB RAM )
-------- EVE-NG Pro Image Settings ( 4 Processors with 3 Cores Per Processor = 12 Cores Processors / 52128 MB RAM / Virtualize Intel VT-x: Enabled)

### TiMOS Nodes Settings
# Nokia 7750 SR - Distributed - CPM Card Nodes ###
CPU: 2 CPU Limit: Enabled RAM (MB): 4096 Ethernets: 2
Management Address: 192.168.2.204/24@active
TiMOS License Path: cf3:\Universal-License.txt
TiMOS Line: slot=A chassis=SR-12 card=sfm4-12
QEMU Custom Options: -machine type=pc,accel=kvm -serial mon:stdio -nographic -no-user-config -nodefaults -rtc base=utc

# Nokia 7750 SR - Distributed - IOM Card Nodes ###
CPU: 2 CPU Limit: Enabled RAM (MB): 2048 Ethernets: 9
TiMOS Line: slot=2 chassis=SR-12 card=iom3-xp-b mda/1=m12-1gb+2-10gb-xp mda/2=isa-tunnel control-cpu-cores=2
QEMU Custom Options: -machine type=pc,accel=kvm -serial mon:stdio -nographic -no-user-config -nodefaults -rtc base=utc



### ERROR LOG
Watchdog: Task 0x1eb192b0 (sysMonitor) blocked for 146 ticks.

Fatal Error: CORE0 dead.

***************************************************************

Disabling switch fabric and mgmt ethernet communications

***************************************************************
9a4da0d vxTaskEntry +1d : sysMonitorTask (0, 0, 0, 0)
6fcd67f sysMonitorTask +26f: wdCheckForFailedCores (0, 0, 0, 0)
6fca663 wdCheckForFailedCores+133: timosCrashDumpGeneral (20d98660, 0, 6564203045524f43, 6461)
1bc6115 timosCrashDumpGeneral+25 : timosCrashDumpSystemState (0, 1, 20d986b0, 6fca668)
1bc5f68 timosCrashDumpSystemState+a8 : debugDisplayBootLog (5f9c1bdc20d98650, 1a, 2054434f20495246, 37353a3331203033)
1be3791 debugDisplayBootLog+11 : debugCloseBootLog (1, 1, 20d98610, 1bc5f6d)
1be36f8 debugCloseBootLog+28 : debugSaveBootLog (20d98590, 1bc4c65, 12a410, 1)
1be3667 debugSaveBootLog+77 : debugWriteBootLog (20d983e0, 9abe632, a8386b0, ffffffff)
1be35c7 debugWriteBootLog+57 : closeURL (ffffffff00000077, 0, 1, 0)
1ab4790 closeURL +10 : urlCloseFile (20d983b0, 1be35cc, ffffffff00000077, 0)
1ab1cb2 urlCloseFile +52 : close (1468, 1468, 20d981d0, 1ab4792)
9abce44 close +4 : iosClose (20d981c0, 1ab1cb7, 1468, 1468)
9abe4dc iosClose +5c : dosFsClose (400000000000000, 20d981e0, 0, 20d981e0)
9a5d0e0 dosFsClose +180: cbioIoctl (12a410, 1fba1ed0, 77, 0)
9b158a3 cbioIoctl +33 : dpartIoctl (20d98110, 1f8345e0, 1f839fa0, 1f82e510)
9a63f3b dpartIoctl +11b: dcacheIoctl (1f83af38, cb100010, 0, 0)
9a57924 dcacheIoctl +2e4: dcacheQuickFlush (20d980b0, 1f83af38, cb100010, 0)
9a57f0e dcacheQuickFlush+9e : dcacheManyFlushInval (1f843d20, 0, 1911b060, 0)
9a569ab dcacheManyFlushInval+9b : dcacheFlushBatch (1f843cb0, ffffffff00000000, 0, f00000001)
9a5686e dcacheFlushBatch+1de: blkWrapBlkRW (1f83ac00, 0, 1f843d20, 1911b070)
9b161d8 blkWrapBlkRW +88 : ataBlkWrt (1f89b510, 100000000, 2a700000001, 2a600000000)
9b11d0a ataBlkWrt +a : ataBlkRW (20d97f60, 9b161db, 1f89b510, 100000000)
9b11f50 ataBlkRW +130: sysOutWordString (1, 1, 19111ed8, 0)
*** Blocked task info during crash dump - ending ***

Rebooting...
Using preloaded VxWorks boot loader at 0x0000000000008000, size 0x0007D000, entrypoint 0x0000000000008010

Re: TiMOS R16 rebooting (Fatal Error: Core0 DEAD)

Posted: Fri Oct 30, 2020 6:05 pm
by ablongo
I have all licenses for use, so it is not a problem.
And i followed the EVE-Intel (https://gitlab.com/eve-ng-dev/templates ... ster/intel) for Nodes settings.

When i using only 2(two) nodes, it is perfect, no issues.
This rebooting problem occurs only when i upgrade the topology to a little more complex, with 4-6 Nodes up/running and exchanging IGP protocols (ISIS or OSPF or RIP), for example.

Re: TiMOS R16 rebooting (Fatal Error: Core0 DEAD)

Posted: Sun Nov 01, 2020 8:26 am
by Uldis (UD)
And how much CPU and RAM has your EVE sir?
run topologies need resource of CPU...

Re: TiMOS R16 rebooting (Fatal Error: Core0 DEAD)

Posted: Mon Nov 02, 2020 6:18 pm
by ablongo
Uldis (UD) wrote:
Sun Nov 01, 2020 8:26 am
And how much CPU and RAM has your EVE sir?
run topologies need resource of CPU...
### My MiniPC config is an Intel i9 Processor, 64G RAM, 1Tb SSD running Ubuntu 20 )
EVE-NG Pro running on VMware Worksation
---- VMware Application Settings ( Memory Allocation: 56224 MB RAM )
-------- EVE-NG Pro Image Settings ( 4 Processors with 3 Cores Per Processor = 12 Cores Processors / 52128 MB RAM / Virtualize Intel VT-x: Enabled)

If each TiMOS Router on topology consume 6GB (4GB for CPM + 2GB for IOM), so i understand, in theory, that my environment should be able to run 8 NODES with 6GB each.
I am able to bring 6 TiMOS Router (6GB each) up without problem, the issue/rebooting begins when any routing (RIP, OSPF, ISIS, BGP) is running. A simple ping/traceroute from one point of topology to another is enough to start the rebooting issue too.

Re: TiMOS R16 rebooting (Fatal Error: Core0 DEAD)

Posted: Tue Nov 03, 2020 9:54 am
by Uldis (UD)
first I will sugest to make settings on VM CPU like this
8/1, 12/1, 16/1 or so on, it is best practice and tested a years for EVE in virtual enviroments.
Every set of Nokia reqiuire 4 CPU cores in total, with card IOM card.
Combo node single, x2 CPU and 6GB Ram..
Means if you loot at your CPU assignments max what your eve can afford to run stabile is 4-5 Nokia units.
Lets say 3xCPM and 3xIOM. (6 units each has 2 CPU cores=12)

My Nokia settings:
Single CPM Node:
Timos line: slot=A chassis=sr-1s card=cpm-1s slot=1 chassis=sr-1s card=xcm-1s mda/1=s18-100gb-qsfp28
qemu line: -machine type=pc,accel=kvm -serial mon:stdio -nographic -no-user-config -nodefaults -rtc base=utc -cpu host

If set of 2 nodes CPM+IOM:
CPM node:
Timos: slot=A chassis=SR-12 card=cpm5
Qemu: -machine type=pc,accel=kvm -serial mon:stdio -nographic -no-user-config -nodefaults -rtc base=utc

IOM Card 1:
Timos: slot=1 chassis=sr-12 slot=1 card=iom3-xp-b mda/1=m10-1gb-xp-sfp
Qemu: -machine type=pc,accel=kvm -serial mon:stdio -nographic -no-user-config -nodefaults -rtc base=utc

IOM Card 2:
Timos: slot=2 chassis=sr-12 slot=1 card=iom3-xp-b mda/1=m10-1gb-xp-sfp
Qemu: -machine type=pc,accel=kvm -serial mon:stdio -nographic -no-user-config -nodefaults -rtc base=utc

All are using qemu version 4.1.0 and NIC e1000
NOKIA version 20.2.R1

Re: TiMOS R16 rebooting (Fatal Error: Core0 DEAD)

Posted: Tue Nov 03, 2020 2:56 pm
by ablongo
Uldis (UD) wrote:
Tue Nov 03, 2020 9:54 am
first I will sugest to make settings on VM CPU like this
8/1, 12/1, 16/1 or so on, it is best practice and tested a years for EVE in virtual enviroments.
Every set of Nokia reqiuire 4 CPU cores in total, with card IOM card.
Combo node single, x2 CPU and 6GB Ram..
Means if you loot at your CPU assignments max what your eve can afford to run stabile is 4-5 Nokia units.
Lets say 3xCPM and 3xIOM. (6 units each has 2 CPU cores=12)

My Nokia settings:
Single CPM Node:
Timos line: slot=A chassis=sr-1s card=cpm-1s slot=1 chassis=sr-1s card=xcm-1s mda/1=s18-100gb-qsfp28
qemu line: -machine type=pc,accel=kvm -serial mon:stdio -nographic -no-user-config -nodefaults -rtc base=utc -cpu host

If set of 2 nodes CPM+IOM:
CPM node:
Timos: slot=A chassis=SR-12 card=cpm5
Qemu: -machine type=pc,accel=kvm -serial mon:stdio -nographic -no-user-config -nodefaults -rtc base=utc

IOM Card 1:
Timos: slot=1 chassis=sr-12 slot=1 card=iom3-xp-b mda/1=m10-1gb-xp-sfp
Qemu: -machine type=pc,accel=kvm -serial mon:stdio -nographic -no-user-config -nodefaults -rtc base=utc

IOM Card 2:
Timos: slot=2 chassis=sr-12 slot=1 card=iom3-xp-b mda/1=m10-1gb-xp-sfp
Qemu: -machine type=pc,accel=kvm -serial mon:stdio -nographic -no-user-config -nodefaults -rtc base=utc

All are using qemu version 4.1.0 and NIC e1000
NOKIA version 20.2.R1
Thank you Uldis, I will try it and print the results here later! Thank you for you help!!