On Wed, 29 May 2024 at 13:07, Magnus Karlsson <magnus.karlsson@xxxxxxxxx> wrote: > > On Wed, 29 May 2024 at 11:08, Yuval El-Hanany <YuvalE@xxxxxxxxxxx> wrote: > > > > Hi, > > I got kernel panic on Kernel 5.15.117 (with the shared umem kernel patch). The scenario was simple. Instead of redirecting to a different cpu using CPUMAP, I’ve tried redirecting traffic from a CPU core to a different CPU core using XSK sockets. I used 2 cores. When each redirected to the other, all worked well. When redirecting traffic from both cores to one core, I got a kernel panic almost immediately under load. The flush_list seems to be per cpu, but somehow it’s messed up when two cores access the it? > > > > Thanks, > > Yuval. > > Thanks Yuval. Will take a look at it. I think the sad conclusion here is that I need to revert the patch that I sent to you. The problem is that the rings between user-space and kernel-space are single producer / single consumer and allowing two or more NAPI threads to access the same ring by performing a redirect to the same socket at the same time will break this. The flush is likely solvable, but the addition of entries to the same ring would require introducing locking (or a completely new ring type) which is going to be way too expensive. With the old code, checking that the queue_index of the queue it got the packet on being equal to the socket's queue_index would disallow this and not trigger this problem. I have no idea why I did not think about this earlier. My sincere apologies for that. I will scratch my head for a while more, but I am not hopeful that I will come up with a good solution for this. > > Two different dumps. > > > > Dump 1: > > > > 2024-05-29T01:[ 306.997548] BUG: kernel NULL pointer dereference, address: 0000000000000008 > > [ 307.088372] #PF: supervisor read access in kernel mode > > [ 307.149079] #PF: error_code(0x0000) - not-present page > > [ 307.209774] PGD 10f131067 P4D 10f131067 PUD 102642067 PMD 0 > > [ 307.276608] Oops: 0000 [#1] SMP > > [ 307.313712] CPU: 3 PID: 1919 Comm: sp1 Tainted: P OE 5.15.117-1-ULP-NG #1 > > [ 307.408219] Hardware name: Radware Radware/Default string, BIOS 5.25 (785A.015) 05/11/2023 > > [ 307.505779] RIP: 0010:xsk_flush+0xb/0x40 > > [ 307.552099] Code: a0 03 00 00 01 b8 e4 ff ff ff eb dc 49 83 85 a0 03 00 00 01 b8 e4 ff ff ff eb cd 0f 1f 40 00 48 8b 87 40 03 00 00 55 48 89 e5 <8b> 50 08 48 8b 40 10 89 10 48 8b 87 68 03 00 00 48 8b 80 80 00 00 > > [ 307.773694] RSP: 0000:ffffb7ae01037c80 EFLAGS: 00010287 > > [ 307.835401] RAX: 0000000000000000 RBX: ffffa0a88f8ab768 RCX: ffffa0a88f8abac0 > > [ 307.919670] RDX: ffffa0a88f8abac0 RSI: 0000000000000004 RDI: ffffa0a88f8ab768 > > [ 308.003922] RBP: ffffb7ae01037c80 R08: ffffa0a10b3e0000 R09: 000000000000769f > > [ 308.088172] R10: ffffa0a1035ca000 R11: 000000000d7f9180 R12: ffffa0a88f8ab768 > > [ 308.172405] R13: ffffa0a88f8ebac0 R14: ffffa0a2ef135300 R15: 0000000000000155 > > [ 308.256635] FS: 00007ffff7e97a80(0000) GS:ffffa0a88f8c0000(0000) knlGS:0000000000000000 > > [ 308.352186] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 308.420043] CR2: 0000000000000008 CR3: 000000010cf6e000 CR4: 0000000000750ee0 > > [ 308.504309] PKRU: 55555554 > > [ 308.536296] Call Trace: > > [ 308.565209] <TASK> > > [ 308.590026] ? show_regs+0x56/0x60 > > [ 308.630218] ? __die_body+0x1a/0x60 > > [ 308.671433] ? __die+0x25/0x30 > > [ 308.707529] ? page_fault_oops+0xc0/0x440 > > [ 308.754897] ? do_sys_poll+0x47c/0x5e0 > > [ 308.799188] ? do_user_addr_fault+0x319/0x6e0 > > [ 308.850659] ? exc_page_fault+0x6c/0x130 > > [ 308.896992] ? asm_exc_page_fault+0x27/0x30 > > [ 308.946398] ? xsk_flush+0xb/0x40 > > [ 308.985546] __xsk_map_flush+0x3a/0x80 > > [ 309.029824] xdp_do_flush+0x13/0x20 > > [ 309.071043] i40e_finalize_xdp_rx+0x44/0x50 [i40e] > > 56:07-08:00 NOTI[ 309.127653] i40e_clean_rx_irq_zc+0x132/0x500 [i40e] > > [ 309.202736] i40e_napi_poll+0x119/0x1270 [i40e] > > CE slb: real se[ 309.256285] ? xsk_sendmsg+0xf4/0x100 > > [ 309.315969] ? sock_sendmsg+0x2e/0x40 > > [ 309.359244] __napi_poll+0x23/0x160 > > [ 309.400482] net_rx_action+0x232/0x290 > > [ 309.444778] __do_softirq+0xd0/0x270 > > [ 309.487012] irq_exit_rcu+0x74/0xa0 > > [ 309.528241] common_interrupt+0x83/0xa0 > > [ 309.573577] asm_common_interrupt+0x27/0x40 > > [ 309.623017] RIP: 0033:0x5cb685 > > [ 309.659115] Code: b0 fb ff ff 48 8b 8d b8 fb ff ff 44 89 f8 23 44 91 08 74 09 31 d2 89 d1 e9 22 fe ff ff 83 85 88 fb ff ff 01 8b 95 88 fb ff ff <39> 15 79 31 1d 04 0f 87 c3 fd ff ff 4c 8b 9d b8 fb ff ff e9 a2 f8 > > [ 309.880853] RSP: 002b:00007fffffffd820 EFLAGS: 00000206 > > [ 309.942608] RAX: 0000000000000000 RBX: 0000000000000047 RCX: 00000000050912bc > > [ 310.026922] RDX: 0000000000000033 RSI: 0000000000000000 RDI: 0000000000000001 > > [ 310.111228] RBP: 00007fffffffdd50 R08: 0000000000000001 R09: 0000000000000008 > > [ 310.195548] R10: 0000000002f68e00 R11: 00000000050912bc R12: 0000000000000047 > > [ 310.279877] R13: 0000000000000022 R14: 0000000004f08560 R15: 00000000ff55557f > > [ 310.364168] </TASK> > > [ 310.390014] Modules linked in: uio i40e ec(PO) evdev ncs_acpi(O) bonding sr_mod cdrom usb_storage dwmac_intel stmmac i2c_i801 pcs_xpcs phylink i2c_smbus [last unloaded: i40e] > > rvice r5, IP 4.4[ 310.573779] CR2: 0000000000000008 > > [ 310.629359] ---[ end trace 78cf9f96f477759d ]--- > > .4.5:80 operational, affected virt 2.2.2.10 > > > > Dump 2: > > > > 2024-05-29T01:56:07-08:00 NOTICE slb: real serve[ 311.250217] RIP: 0010:xsk_flush+0xb/0x40 > > [ 311.300334] Code: a0 03 00 00 01 b8 e4 ff ff ff eb dc 49 83 85 a0 03 00 00 01 b8 e4 ff ff ff eb cd 0f 1f 40 00 48 8b 87 40 03 00 00 55 48 89 e5 <8b> 50 08 48 8b 40 10 89 10 48 8b 87 68 03 00 00 48 8b 80 80 00 00 > > [ 311.522070] RSP: 0000:ffffb7ae01037c80 EFLAGS: 00010287 > > [ 311.583811] RAX: 0000000000000000 RBX: ffffa0a88f8ab768 RCX: ffffa0a88f8abac0 > > [ 311.668118] RDX: ffffa0a88f8abac0 RSI: 0000000000000004 RDI: ffffa0a88f8ab768 > > [ 311.752405] RBP: ffffb7ae01037c80 R08: ffffa0a10b3e0000 R09: 000000000000769f > > [ 311.836716] R10: ffffa0a1035ca000 R11: 000000000d7f9180 R12: ffffa0a88f8ab768 > > [ 311.921040] R13: ffffa0a88f8ebac0 R14: ffffa0a2ef135300 R15: 0000000000000155 > > [ 312.005358] FS: 00007ffff7e97a80(0000) GS:ffffa0a88f8c0000(0000) knlGS:0000000000000000 > > [ 312.100975] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 312.168886] CR2: 0000000000000008 CR3: 000000010cf6e000 CR4: 0000000000750ee0 > > [ 312.253232] PKRU: 55555554 > > [ 312.285245] Kernel panic - not syncing: Fatal exception in interrupt > > [ 312.360323] Kernel Offset: 0x3a000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > [ 312.570739] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- > > > > > > > > 2024-05-29T01:00:47-08:00 NOTICE slb: real service r16, IP 4.4.4.16:80 operational, affected [ 390.602501] BUG: kernel NULL pointer dereference, address: 0000000000000008 > > [ 390.699875] #PF: supervisor read access in kernel mode > > [ 390.760620] #PF: error_code(0x0000) - not-present page > > [ 390.821361] PGD 10d319067 P4D 10d319067 PUD 10df94067 PMD 0 > > [ 390.888266] Oops: 0000 [#1] SMP > > [ 390.925409] CPU: 3 PID: 1922 Comm: sp1 Tainted: P OE 5.15.117-1-ULP-NG #1 > > [ 391.020040] Hardware name: Radware Radware/Default string, BIOS 5.25 (785A.015) 05/11/2023 > > [ 391.117759] RIP: 0010:xsk_flush+0xb/0x40 > > [ 391.164147] Code: a0 03 00 00 01 b8 e4 ff ff ff eb dc 49 83 85 a0 03 00 00 01 b8 e4 ff ff ff eb cd 0f 1f 40 00 48 8b 87 40 03 00 00 55 48 89 e5 <8b> 50 08 48 8b 40 10 89 10 48 8b 87 68 03 00 00 48 8b 80 80 00 00 > > [ 391.386008] RSP: 0018:ffffae5c80244d48 EFLAGS: 00010287 > > [ 391.447793] RAX: 0000000000000000 RBX: ffff8bb80f8ab768 RCX: ffff8bb80f8abac0 > > [ 391.532161] RDX: ffff8bb80f8abac0 RSI: 0000000000000004 RDI: ffff8bb80f8ab768 > > [ 391.616518] RBP: ffffae5c80244d48 R08: ffff8bb0845f0000 R09: 000000000000967a > > [ 391.700883] R10: ffff8bb0833fdc00 R11: 000000000ca5a180 R12: ffff8bb80f8ab768 > > [ 391.785233] R13: ffff8bb80f8ebac0 R14: ffff8bb081571b00 R15: 0000000000000132 > > [ 391.869615] FS: 00007ffff7e97a80(0000) GS:ffff8bb80f8c0000(0000) knlGS:0000000000000000 > > [ 391.965273] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 392.033224] CR2: 0000000000000008 CR3: 000000010d84b000 CR4: 0000000000750ee0 > > [ 392.117580] PKRU: 55555554 > > [ 392.149594] Call Trace: > > [ 392.178536] <IRQ> > > [ 392.202345] ? show_regs+0x56/0x60 > > [ 392.242571] ? __die_body+0x1a/0x60 > > [ 392.283806] ? __die+0x25/0x30 > > [ 392.319919] ? page_fault_oops+0xc0/0x440 > > [ 392.367315] ? i40e_msix_clean_rings+0x27/0x40 [i40e] > > [ 392.427035] ? do_user_addr_fault+0x319/0x6e0 > > [ 392.478538] ? handle_irq_event+0x41/0x60 > > [ 392.525934] ? exc_page_fault+0x6c/0x130 > > [ 392.572304] ? asm_exc_page_fault+0x27/0x30 > > [ 392.621771] ? xsk_flush+0xb/0x40 > > [ 392.660954] ? xsk_flush+0x34/0x40 > > [ 392.701174] __xsk_map_flush+0x3a/0x80 > > [ 392.745497] xdp_do_flush+0x13/0x20 > > [ 392.786749] i40e_finalize_xdp_rx+0x44/0x50 [i40e] > > virt 2.2.2.10 > > [ 392.843370] i40e_clean_rx_irq_zc+0x132/0x500 [i40e] > > [ 392.918465] i40e_napi_poll+0x119/0x1270 [i40e] > > > > 2024-05-29T01:0[ 392.972027] ? scheduler_tick+0x9f/0xd0 > > [ 393.033806] ? tick_sched_do_timer+0x40/0x40 > > [ 393.084300] __napi_poll+0x23/0x160 > > [ 393.125556] net_rx_action+0x232/0x290 > > [ 393.169869] __do_softirq+0xd0/0x270 > > [ 393.212141] irq_exit_rcu+0x74/0xa0 > > [ 393.253387] common_interrupt+0x62/0xa0 > > [ 393.298734] </IRQ> > > [ 393.323571] <TASK> > > [ 393.348409] asm_common_interrupt+0x27/0x40 > > [ 393.397871] RIP: 0010:__fget_light+0x43/0x100 > > [ 393.449396] Code: b0 0b 00 00 8b 08 83 f9 01 75 36 48 8b 40 20 8b 18 39 df 73 1f 89 fa 48 39 da 48 19 db 48 8b 40 08 21 df 48 8d 04 f8 48 8b 00 <48> 85 c0 74 05 85 70 44 74 02 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d > > [ 393.671279] RSP: 0018:ffffae5c81057a98 EFLAGS: 00000202 > > [ 393.733054] RAX: ffff8bb083541100 RBX: ffffffffffffffff RCX: 0000000000000001 > > [ 393.817413] RDX: 000000000000002f RSI: 0000000000004000 RDI: 000000000000002f > > [ 393.901777] RBP: ffffae5c81057ac0 R08: 000000010000002f R09: 0000000000000000 > > [ 393.986133] R10: ffffae5c81057f08 R11: 0000000000000000 R12: 0000000000000000 > > [ 394.070506] R13: ffffae5c81057b54 R14: ffffae5c81057b4c R15: ffffae5c81057b4c > > [ 394.154884] __fdget+0xe/0x10 > > [ 394.189993] do_sys_poll+0x1fd/0x5e0 > > [ 394.232287] ? step_into+0x11f/0x750 > > [ 394.274570] ? common_interrupt+0x8e/0xa0 > > [ 394.321987] ? common_interrupt+0x8e/0xa0 > > [ 394.369406] ? asm_common_interrupt+0x27/0x40 > > [ 394.420929] ? xsk_flush+0x34/0x40 > > [ 394.461166] ? xsk_tx_peek_release_desc_batch+0x24c/0x2e0 > > [ 394.525014] ? i40e_clean_rx_irq_zc+0x342/0x500 [i40e] > > [ 394.585763] ? i40e_xsk_wakeup+0xa2/0xc0 [i40e] > > [ 394.639332] ? xsk_xmit+0x6d/0x6c0 > > [ 394.679569] ? i40e_napi_poll+0xc39/0x1270 [i40e] > > [ 394.735202] ? xsk_sendmsg+0xf4/0x100 > > [ 394.778498] ? sock_sendmsg+0x2e/0x40 > > [ 394.821808] ? __sys_sendto+0x13a/0x170 > > [ 394.867166] ? __snd_timer_user_ioctl+0x9e0/0xaa0 > > [ 394.922812] ? net_rx_action+0x232/0x290 > > [ 394.969206] __x64_sys_poll+0xa0/0x130 > > [ 395.013534] ? __x64_sys_poll+0xa0/0x130 > > [ 395.059930] do_syscall_64+0x34/0xb0 > > [ 395.102213] entry_SYSCALL_64_after_hwframe+0x44/0xae > > [ 395.161948] RIP: 0033:0x7ffff76f0d47 > > [ 395.204222] Code: 00 00 00 5b 49 8b 45 10 5d 41 5c 41 5d 41 5e c3 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 07 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 > > [ 395.426119] RSP: 002b:00007fffffffe6c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000007 > > [ 395.515625] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffff76f0d47 > > [ 395.599996] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00007fffffffe6f8 > > [ 395.684371] RBP: 00007ffff7e85f80 R08: 00000000173eed80 R09: 0000000000000051 > > [ 395.768741] R10: 000001193c2583df R11: 0000000000000246 R12: 0000000000000000 > > [ 395.853108] R13: 00007ffff7e85fb0 R14: 0000000000000000 R15: 0000000000000000 > > [ 395.937488] </TASK> > > [ 395.963347] Modules linked in: uio i40e ec(PO) evdev ncs_acpi(O) bonding sr_mod cdrom usb_storage dwmac_intel stmmac i2c_i801 pcs_xpcs phylink i2c_smbus [last unloaded: i40e] > > 0:47-08:00 NOTIC[ 396.147236] CR2: 0000000000000008 > > [ 396.202839] ---[ end trace 9b309a97c006510f ]--- > > E slb: real server r16, IP 4.4.4.16 operational > > > > 2024-05-29T01:00:47-08:00 NOTICE slb: real [ 396.344228] RIP: 0010:xsk_flush+0xb/0x40 > > [ 396.406975] Code: a0 03 00 00 01 b8 e4 ff ff ff eb dc 49 83 85 a0 03 00 00 01 b8 e4 ff ff ff eb cd 0f 1f 40 00 48 8b 87 40 03 00 00 55 48 89 e5 <8b> 50 08 48 8b 40 10 89 10 48 8b 87 68 03 00 00 48 8b 80 80 00 00 > > [ 396.628859] RSP: 0018:ffffae5c80244d48 EFLAGS: 00010287 > > [ 396.690632] RAX: 0000000000000000 RBX: ffff8bb80f8ab768 RCX: ffff8bb80f8abac0 > > [ 396.774985] RDX: ffff8bb80f8abac0 RSI: 0000000000000004 RDI: ffff8bb80f8ab768 > > [ 396.859346] RBP: ffffae5c80244d48 R08: ffff8bb0845f0000 R09: 000000000000967a > > [ 396.943694] R10: ffff8bb0833fdc00 R11: 000000000ca5a180 R12: ffff8bb80f8ab768 > > [ 397.028064] R13: ffff8bb80f8ebac0 R14: ffff8bb081571b00 R15: 0000000000000132 > > [ 397.112433] FS: 00007ffff7e97a80(0000) GS:ffff8bb80f8c0000(0000) knlGS:0000000000000000 > > [ 397.208103] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 397.276061] CR2: 0000000000000008 CR3: 000000010d84b000 CR4: 0000000000750ee0 > > [ 397.360414] PKRU: 55555554 > > [ 397.392419] Kernel panic - not syncing: Fatal exception in interrupt > > [ 397.467554] Kernel Offset: 0x34600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > [ 397.678387] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---