Intel i810 (ice driver) + AF_XDP: list_add corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi.

I'm running a custom AF_XDP application (and unfortunately I'm not allowed to share it), and I see following Call Trace when running it with Inter e810 NIC (ice driver). The crash happens when my application closes its AF_XDP sockets. There are 2 AF_XDP sockets in the application, and they are using different channels of the same network interface.

My application runs properly in non-zerocopy mode, and in zerocopy mode with following NICs/drivers:
- Intel x710 (i40e driver)
- Mellanox 5 (mlx5_core driver)
- various Solarlfare NICs (sfc driver)

There is the kernel log:

[ 391.084249] list_add corruption. prev->next should be next (ffff973474ebd4f0), but was ffff973474ebb880. (prev=ffff973452662450). [ 391.084249] list_del corruption. next->prev should be ffff973474eb79f0, but was ffff973474ecf630. (next=ffff973474ecf630)
[  391.084258] ------------[ cut here ]------------
[  391.084265] kernel BUG at lib/list_debug.c:62!
[  391.084265] ------------[ cut here ]------------
[  391.084269] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  391.084270] kernel BUG at lib/list_debug.c:30!
[ 391.084276] CPU: 3 PID: 31 Comm: ksoftirqd/3 Tainted: G OE 6.1.5-sasha1 #6 [ 391.084279] Hardware name: System manufacturer System Product Name/PRIME X299-A II, BIOS 0901 11/06/2020
[  391.084282] RIP: 0010:__list_del_entry_valid.cold+0x23/0x6f
[ 391.084289] Code: e8 55 a0 fe ff 0f 0b 48 89 fe 48 c7 c7 08 d3 54 88 e8 44 a0 fe ff 0f 0b 48 89 d1 48 c7 c7 28 d4 54 88 4c 89 c2 e8 30 a0 fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 d8 d3 54 88 e8 1c a0 fe ff 0f 0b
[  391.084293] RSP: 0018:ffffaf93c02a7c70 EFLAGS: 00010246
[ 391.084296] RAX: 000000000000006d RBX: ffff973474eb79f0 RCX: 0000000000000000 [ 391.084299] RDX: 0000000000000000 RSI: ffffffff8853538e RDI: 00000000ffffffff [ 391.084302] RBP: ffff973452662400 R08: ffffffff88c622c0 R09: 000000000000000f [ 391.084304] R10: 000000000000000f R11: ffffffff8958bb2e R12: 0000000000000011 [ 391.084306] R13: ffff9734a38b42d0 R14: ffff9734a38b4298 R15: 0000000000000200 [ 391.084309] FS: 0000000000000000(0000) GS:ffff97439fd80000(0000) knlGS:0000000000000000
[  391.084312] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 391.084315] CR2: 000056067c3f20a0 CR3: 0000000340c10001 CR4: 00000000003706e0 [ 391.084318] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 391.084320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  391.084322] Call Trace:
[  391.084325]  <TASK>
[  391.084327]  xp_alloc_batch+0x24d/0x2c0
[  391.084334]  __ice_alloc_rx_bufs_zc+0xfc/0x170 [ice]
[ 391.084368] ice_clean_rx_irq_z[ 391.084388] ice_napi_poll+0x47f/0x680 [ice]
[  391.084408]  __napi_poll+0x29/0x160
[  391.084413]  net_rx_action+0x2a2/0x360
[  391.084417]  __do_softirq+0xe9/0x2e9
[  391.084420]  run_ksoftirqd+0x34/0x40
[  391.084425]  smpboot_thread_fn+0x185/0x220
[  391.084429]  ? sort_range+0x20/0x20
[  391.084432]  kthread+0xe5/0x110
[  391.084435]  ? kthread_complete_and_exit+0x20/0x20
[  391.084438]  ret_from_fork+0x1f/0x30
[  391.084443]  </TASK>
[ 391.084446] Modules linked in: onload(OE) sfc_char(OE) sfc_resource(OE) netconsole(E) cts(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) netfs(E) intel_rapl_msr(E) intel_rapl_common(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) nouveau(E) kvm_intel(E) kvm(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha512_generic(E) snd_hda_codec_realtek(E) drm_display_helper(E) snd_hda_codec_hdmi(E) cec(E) aesni_intel(E) rc_core(E) snd_hda_codec_generic(E) eeepc_wmi(E) drm_ttm_helper(E) crypto_simd(E) cryptd(E) asus_wmi(E) snd_hda_intel(E) ttm(E) platform_profile(E) irdma(E) battery(E) rapl(E) sparse_keymap(E) drm_kms_helper(E) snd_intel_dspcfg(E) snd_intel_sdw_acpi(E) i40e(E) i2c_algo_bit(E) intel_cstate(E) ledtrig_audio(E) evdev(E) intel_uncore(E) snd_hda_codec(E) rfkill(E) efi_pstore(E) pcspkr(E) video(E) ib_uverbs(E) iTCO_wdt(E) intel_wmi_thunderbolt(E) snd_hda_core(E) wmi_bmof(E) intel_pmc_bxt(E) snd_hwdep(E) iTCO_vendor_support(E) ib_core(E) sg(E) snd_pcm(E) [ 391.084485] watchdog(E) ioatdma(E) snd_timer(E) mei_me(E) snd(E) mei(E) dca(E) soundcore(E) button(E) acpi_tad(E) mxm_wmi(E) nct6775(E) nct6775_core(E) hwmon_vid(E) coretemp(E) vfio_pci(E) vfio_pci_core(E) vfio_virqfd(E) vfio_iommu_type1(E) vfio(E) irqbypass(E) nfsd(E) uio_pci_generic(E) uio(E) nfs_acl(E) lockd(E) grace(E) auth_rpcgss(E) configfs(E) sunrpc(E) fuse(E) drm(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) hid(E) dm_mod(E) sd_mod(E) nvme([ 391.084543] invalid opcode: 0000 [#2] PREEMPT SMP NOPTI
[  391.084545] ---[ end trace 0000000000000000 ]---
[ 391.084547] CPU: 8 PID: 24541 Comm: kworker/8:0 Tainted: G D OE 6.1.5-sasha1 #6 [ 391.084554] Hardware name: System manufacturer System Product Name/PRIME X299-A II, BIOS 0901 11/06/2020
[  391.084557] Workqueue: events xp_release_deferred
[  391.084563] RIP: 0010:__list_add_valid.cold+0x3a/0x5b
[ 391.084568] Code: f2 4c 89 c1 48 89 fe 48 c7 c7 d0 d2 54 88 e8 8b a0 fe ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 78 d2 54 88 e8 74 a0 fe ff <0f> 0b 4c 89 c1 48 c7 c7 20 d2 54 88 e8 63 a0 fe ff 0f 0b 48 c7 c7
[  391.084572] RSP: 0018:ffffaf93ce43fd40 EFLAGS: 00010246
[ 391.084576] RAX: 0000000000000075 RBX: ffff973474ebe8e8 RCX: 0000000000000000 [ 391.084578] RDX: 0000000000000000 RSI: ffffffff8853538e RDI: 00000000ffffffff [ 391.084581] RBP: ffff973474ebe940 R08: 0000000000000001 R09: 0000000000000019 [ 391.084583] R10: 0000000000000729 R11: 6c65645f7473696c R12: ffff973452662400 [ 391.084586] R13: ffff973474ebd4f0 R14: ffff973452662450 R15: ffff973452662d00 [ 391.084588] FS: 0000000000000000(0000) GS:ffff9743a0000000(0000) knlGS:0000000000000000
[  391.084592] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 391.084594] CR2: 00007f6c0515b198 CR3: 0000000340c10004 CR4: 00000000003706e0 [ 391.084597] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[ 391.084599] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7[ 391.084602] Call Trace:
[  391.084604]  <TASK>
[  391.084606]  xp_free+0x51/0x80
[  391.084611]  ice_xsk_clean_rx_ring+0x39/0x60 [ice]
[  391.084638]  ice_clean_rx_ring+0x152/0x170 [ice]
[  391.084659]  ice_xsk_pool_setup+0x5ed/0x7b0 [ice]
[  391.084679]  xp_disable_drv_zc+0x60/0xd0
[  391.084682]  ? __schedule+0x30e/0xa40
[  391.084686]  xp_release_deferred+0x22/0xb0
[  391.084689]  process_one_work+0x1e2/0x3b0
[  391.084694]  ? rescuer_thread+0x390/0x390
[  391.084698]  worker_thread+0x50/0x3a0
[  391.084701]  ? rescuer_thread+0x390/0x390
[  391.084705]  kthread+0xe5/0x110
[  391.084708]  ? kthread_complete_and_exit+0x20/0x20
[  391.084711]  ret_from_fork+0x1f/0x30
[  391.084716]  </TASK>
[ 391.084717] Modules linked in: onload(OE) sfc_char(OE) sfc_resource(OE) netconsole(E) cts(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) netfs(E) intel_rapl_msr(E) intel_rapl_common(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) nouveau(E) kvm_intel(E) kvm(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha512_generic(E) snd_hda_codec_realtek(E) drm_display_helper(E) snd_hda_codec_hdmi(E) cec(E) aesni_intel(E) rc_core(E) snd_hda_codec_generic(E) eeepc_wmi(E) drm_ttm_helper(E) crypto_simd(E) cryptd(E) asus_wmi(E) snd_hda_intel(E) ttm(E) platform_profile(E) irdma(E) battery(E) rapl(E) sparse_keymap(E) drm_kms_helper(E) snd_intel_dspcfg(E) snd_intel_sdw_acpi(E) i40e(E) i2c_algo_bit(E) intel_cstate(E) ledtrig_audio(E) evdev(E) intel_uncore(E) snd_hda_codec(E) rfkill(E) efi_pstore(E) pcspkr(E) video(E) ib_uverbs(E) iTCO_wdt(E) intel_wmi_thunderbolt(E) snd_hda_core(E) wmi_bmof(E) intel_pmc_bxt(E) snd_hwdep(E) iTCO_vendor_support(E) ib_core(E) sg(E) snd_pcm(E) [ 391.084749] watchdog(E) ioatdma(E) snd_timer(E) mei_me(E) snd(E) mei(E) dca(E) soundcore(E) button(E) acpi_tad(E) mxm_wmi(E) nct6775(E) nct6775_core(E) hwmon_vid(E) coretemp(E) vfio_pci(E) vfio_pci_core(E) vfio_virqfd(E) vfio_iommu_type1(E) vfio(E) irqbypass(E) nfsd(E) uio_[ 391.084803] ---[ end trace 0000000000000000 ]---
[  391.111804] RIP: 0010:__list_del_entry_valid.cold+0x23/0x6f
[ 391.111812] Code: e8 55 a0 fe ff 0f 0b 48 89 fe 48 c7 c7 08 d3 54 88 e8 44 a0 fe ff 0f 0b 48 89 d1 48 c7 c7 28 d4 54 88 4c 89 c2 e8 30 a0 fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 d8 d3 54 88 e8 1c a0 fe ff 0f 0b
[  391.111817] RSP: 0018:ffffaf93c02a7c70 EFLAGS: 00010246
[ 391.111821] RAX: 000000000000006d RBX: ffff973474eb79f0 RCX: 0000000000000000 [ 391.111824] RDX: 0000000000000000 RSI: ffffffff8853538e RDI: 00000000ffffffff [ 391.111826] RBP: ffff973452662400 R08: ffffffff88c622c0 R09: 000000000000000f [ 391.111829] R10: 000000000000000f R11: ffffffff8958bb2e R12: 0000000000000011 [ 391.111832] R13: ffff9734a38b42d0 R14: ffff9734a38b4298 R15: 0000000000000200 [ 391.111835] FS: 0000000000000000(0000) GS:ffff97439fd80000(0000) knlGS:0000000000000000
[  391.111838] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 391.111840] CR2: 000056067c3f20a0 CR3: 00000002b4e0a006 CR4: 00000000003706e0 [ 391.111843] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 391.111846] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  391.111848] Kernel panic - not syncing: Fatal exception in interrupt
[ 391.140375] Kernel Offset: 0x6200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 391.169096] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

--
Alexandra N. Kossovsky
OKTET Labs (http://www.oktetlabs.ru/)



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux