Page 1 of 1

EXOS nodes not see echother in Lab and Sharing LACP LAG is flapped

Posted: Fri Oct 01, 2021 9:00 pm
by aleksandr_vd
Hellow Gues!
I have Community EVE-NG 2.0.3-112 on standalone ESXi 6.7 on ProLiant BL460c Gen8 with kvm-ok, 24 vCPUs, 64 Gb RAM, 150 Gb HDD on eve-ng VM in ESXi
And i have two node EXOS (with 12 ports) connected ethother trouth 10 and 11 ports like this:
Image
Other nodes in a LAB stoped.
This ports 10-11 configured as shared with lacp for mlag isc. But ISC do not ping echother.
Altrough sharing lacp lag 10 flap states persistently on both nodes.

Code: Select all

* vCore-4.54 # show lacp lag 10

Lag   Actor    Actor  Partner           Partner  Partner Agg   Actor
      Sys-Pri  Key    MAC               Sys-Pri  Key     Count MAC
--------------------------------------------------------------------------------
10          0  0x03f2 00:00:00:00:00:00       0  0x0000      0 50:00:00:04:00:00

Port list:

Member     Port      Rx           Sel          Mux            Actor     Partner
Port       Priority  State        Logic        State          Flags     Port
--------------------------------------------------------------------------------
10         0         Initialize   Unselected   Detached       A-G-----  0
11         0         Initialize   Unselected   Detached       A-G-----  0
================================================================================
Actor Flags: A-Activity, T-Timeout, G-Aggregation, S-Synchronization
             C-Collecting, D-Distributing, F-Defaulted, E-Expired


* vCore-4.55 # show lacp lag 10

Lag   Actor    Actor  Partner           Partner  Partner Agg   Actor
      Sys-Pri  Key    MAC               Sys-Pri  Key     Count MAC
--------------------------------------------------------------------------------
10          0  0x03f2 50:00:00:03:00:00       0  0x03f2      0 50:00:00:04:00:00

Port list:

Member     Port      Rx           Sel          Mux            Actor     Partner
Port       Priority  State        Logic        State          Flags     Port
--------------------------------------------------------------------------------
10         0         Current      Selected     Waiting        A-G-----  1011
11         0         Current      Selected     Waiting        A-G-----  1010
================================================================================
Actor Flags: A-Activity, T-Timeout, G-Aggregation, S-Synchronization
             C-Collecting, D-Distributing, F-Defaulted, E-Expired

* vCore-4.56 # show lacp lag 10

Lag   Actor    Actor  Partner           Partner  Partner Agg   Actor
      Sys-Pri  Key    MAC               Sys-Pri  Key     Count MAC
--------------------------------------------------------------------------------
10          0  0x03f2 50:00:00:03:00:00       0  0x03f2      0 50:00:00:04:00:00

Port list:

Member     Port      Rx           Sel          Mux            Actor     Partner
Port       Priority  State        Logic        State          Flags     Port
--------------------------------------------------------------------------------
10         0         Current      Selected     Attached       A-GS----  1011
11         0         Idle         Unselected   Detached       --------  0
================================================================================
Actor Flags: A-Activity, T-Timeout, G-Aggregation, S-Synchronization
             C-Collecting, D-Distributing, F-Defaulted, E-Expired

* vCore-4.56 # show lacp lag 10

Lag   Actor    Actor  Partner           Partner  Partner Agg   Actor
      Sys-Pri  Key    MAC               Sys-Pri  Key     Count MAC
--------------------------------------------------------------------------------
10          0  0x03f2 50:00:00:03:00:00       0  0x03f2      0 50:00:00:04:00:00

Port list:

Member     Port      Rx           Sel          Mux            Actor     Partner
Port       Priority  State        Logic        State          Flags     Port
--------------------------------------------------------------------------------
10         0         Current      Selected     Attached       A-GS----  1011
11         0         Current      Selected     Attached       A-GS----  1010
================================================================================
Actor Flags: A-Activity, T-Timeout, G-Aggregation, S-Synchronization
             C-Collecting, D-Distributing, F-Defaulted, E-Expired

* vCore-4.56 #
Also i get capture traffic from port 10 vCore-3 and see that LACP PDU arrives and transmits not constantly.
Look at my PCAP https://www.cloudshark.org/captures/fdc3a6935595

Also sometimes in both nodes alternately appeares below errors:

Code: Select all

[ 1218.768289] CPU 0: soft watchdog expiration warning at 0010:ffffffffc00e7b9b (getTxPifFromLif+0x2b/0x770 [exvlan]) for 785 seconds, process vsm (767)
[ 1223.968240] CPU 0: soft watchdog expiration warning at 0010:ffffffffc00e7b8f (getTxPifFromLif+0x1f/0x770 [exvlan]) for 791 seconds, process vsm (767)
[ 1228.968244] CPU 0: soft watchdog expiration warning at 0010:ffffffffc00f7d08 (getNextPifOnLif+0x18/0x50 [exvlan]) for 796 seconds, process vsm (767)
[ 1233.968246] CPU 0: soft watchdog expiration warning at 0010:ffffffffc00e7b94 (getTxPifFromLif+0x24/0x770 [exvlan]) for 801 seconds, process vsm (767)

[ 2425.912379] CPU 0: soft watchdog expiration warning at 0010:ffffffffc01dfd31 (getNextPifOnLif+0x41/0x50 [exvlan]) for 37 seconds, process swapper/0 (0)
[ 2431.112383] CPU 0: soft watchdog expiration warning at 0010:ffffffffc01dfcda (getFirstPifOnLif+0x1a/0x30 [exvlan]) for 42 seconds, process swapper/0 (0)
[ 2436.112638] CPU 0: soft watchdog expiration warning at 0010:ffffffffc01dfcf0 (getNextPifOnLif+0x0/0x50 [exvlan]) for 47 seconds, process swapper/0 (0)
[ 2441.312367] CPU 0: soft watchdog expiration warning at 0010:ffffffffc01cfb8f (getTxPifFromLif+0x1f/0x770 [exvlan]) for 53 seconds, process swapper/0 (0)
these errors appear randomly and do not depend on changing the EXOS configuration and show command outputs, because changes configuration does not made and error may appearse on node

around load on eve insignificant
Image

But one of vCPU load on 100% with error on node appears:

Code: Select all

root@eve-ng:/opt/unetlab/data/Logs# top
top - 23:00:07 up 7 days,  8:28,  2 users,  load average: 1.12, 1.13, 1.10
Tasks: 401 total,   2 running, 233 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 :  0.0 us,  1.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14 :  0.0 us,  0.7 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu16 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu17 :  0.7 us,  1.0 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu18 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu20 :  0.3 us,  0.7 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21 :  3.4 us,  8.2 sy,  0.0 ni, 88.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu22 :  0.0 us,  0.7 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu23 :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 65965820 total, 47423740 free,  1457700 used, 17084380 buff/cache
KiB Swap:  1097724 total,  1097724 free,        0 used. 63651316 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
3621 root      20   0 2960272 470992  26156 S  99.7  0.7  39:11.66 qemu-system-x86
22524 root      20   0 2960272 477268  26048 S  13.5  0.7  12:26.20 qemu-system-x86
  350 root      25   5       0      0      0 S   0.7  0.0  59:25.44 uksmd
32267 root      20   0   42356   3924   3076 R   0.7  0.0   0:00.65 top
   11 root      20   0       0      0      0 I   0.3  0.0   6:58.29 rcu_sched
 2781 mysql     20   0 2296812  86424  20376 S   0.3  0.1  19:09.24 mysqld
 3641 root      20   0       0      0      0 I   0.3  0.0   2:50.23 kworker/11:1-ev
18189 root      20   0       0      0      0 I   0.3  0.0   1:37.29 kworker/20:0-ev
18823 root      20   0       0      0      0 I   0.3  0.0   0:10.73 kworker/15:1-ev
18893 root      20   0       0      0      0 I   0.3  0.0   1:31.29 kworker/21:0-ev
18994 root      20   0       0      0      0 I   0.3  0.0   1:05.22 kworker/16:2-ev
19381 root      20   0       0      0      0 I   0.3  0.0   1:42.03 kworker/22:2-ev
19673 root      20   0       0      0      0 I   0.3  0.0   2:48.97 kworker/8:2-eve
    1 root      20   0   37820   5708   3964 S   0.0  0.0   0:06.00 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.24 kthreadd
    3 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 rcu_gp
    4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 rcu_par_gp
    6 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/0:0H-kb
    7 root      20   0       0      0      0 I   0.0  0.0   0:00.00 kworker/u48:0-s
    9 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 mm_percpu_wq
   10 root      20   0       0      0      0 S   0.0  0.0   0:00.56 ksoftirqd/0
   12 root      rt   0       0      0      0 S   0.0  0.0   0:02.44 migration/0
   13 root     -51   0       0      0      0 S   0.0  0.0   0:00.00 idle_inject/0
   15 root      20   0       0      0      0 S   0.0  0.0   0:00.00 cpuhp/0
   16 root      20   0       0      0      0 S   0.0  0.0   0:00.00 cpuhp/1
   17 root     -51   0       0      0      0 S   0.0  0.0   0:00.00 idle_inject/1
root@eve-ng:/opt/unetlab/data/Logs#
in somebody have idea what do next for understanding that is a problem and how fixed him?

Regards, aleksandr_vd

Re: EXOS nodes not see echother in Lab and Sharing LACP LAG is flapped

Posted: Tue Oct 18, 2022 4:26 pm
by aldro
With viosl2-adventerprisek9-m.ssa.high_iron_20200929 L3 Port-Channel work with L2 PortChannal with LACP.