We have experienced a kernel BPF null pointer dereference issue on all
our machines since mid of June. It might be related to an upgrade of
libvirt/kvm/qemu at that point of time. But we’re not sure.
None of the servers can be used with this bug, as they crash latest
one hour after reboot. The time period until kernel panic can be
easily reduced down to 2 minutes, when starting one or more
applications of the following list:
- LXD daemon (4.2.1)
- libvirtd daemon (6.4.0) with qemu/kvm guests
- NFS server 2.5.1
- Mozilla Firefox
- Mozilla Thunderbird
If none of the applications run, the systems seem to be stable.
Intermediate solution:
Downgrade Linux kernel to 4.9.226 LTS or 4.4.226 LTS on all the machines
Why this solution works is not clear, yet. One of the major
differences we saw is, that both kernel packages have been configured
with user namespaces disabled.
We experienced the kernel freeze on following Arch Linux kernels:
- 5.7.0 (5.7.0-3-MANJARO x64)
- 5.6.16 (5.6.16-1-MANJARO x64)
- 5.4.44 (5.4.44-1-MANJARO x64)
- 4.19.126 (4.19.126-1-MANJARO x64)
- 4.14.183 (4.14.183-1-MANJARO x64)
Kernel configs can be taken from
https://gitlab.manjaro.org/packages/core.
Subsequent e-mails will contain the relevant extracts from journal or
netconsole logs.
Help and support on this issue is welcome.
Fix is under discussion here:
https://lore.kernel.org/netdev/20200616180352.18602-1-xiyou.wangcong@xxxxxxxxx/
Thanks,
Daniel
Dear Daniel,
thank you for the hint. I will try to follow-up the discussion. For your
convenience I have added some of our many and various logs to this
thread. Maybe it will be of some help for the team.
Below you will find one log from kernel 4.14, which maybe outlines a
different issue. Do we need another thread or do you judge it to have
the same root cause?
Kernel 4.14.183 (4.14.183-1-MANJARO x64)
BUG: unable to handle kernel paging request at 0000200000000002
IP: __cgroup_bpf_run_filter_skb+0xca/0x1b0
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
Modules linked in: rpcsec_gss_krb5 vhost_net vhost tap tun
ebtable_filter ebtables devlink ip6table_filter ip6_tables
iptable_filter fuse netconsole bridge stp llc nct6775 hwmon_vid
nls_iso8859_1 nls_cp437 vfat fat input_leds joydev mousedev
snd_hda_codec_hdmi eeepc_wmi iTCO_wdt asus_wmi mei_wdt sparse_keymap
rfkill intel_rapl iTCO_vendor_support led_class wmi_bmof
x86_pkg_temp_thermal intel_powerclamp coretemp evdev mac_hid kvm_intel
i915 snd_hda_codec_realtek snd_hda_codec_generic kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_hda_intel
aesni_intel aes_x86_64 crypto_simd glue_helper cryptd i2c_algo_bit
snd_hda_codec intel_cstate drm_kms_helper pcspkr snd_hda_core
intel_rapl_perf snd_hwdep e1000e snd_pcm r8169 i2c_i801 intel_gtt mii
syscopyarea snd_timer sysfillrect
sysimgblt snd ptp lpc_ich mei_me fb_sys_fops soundcore shpchp pps_core
mei wmi thermal fan pcc_cpufreq video button sch_fq_codel nfsd
auth_rpcgss oid_registry drm nfs_acl lockd grace agpgart sunrpc
ip_tables x_tables ext4 crc16 mbcache jbd2 fscrypto dm_thin_pool
dm_persistent_data libcrc32c crc32c_generic dm_bio_prison dm_bufio
hid_generic hid_logitech_hidpp dm_mod hid_logitech_dj usbhid hid sr_mod
sd_mod cdrom ahci libahci ehci_pci xhci_pci ehci_hcd libata xhci_hcd
crc32c_intel scsi_mod usbcore usb_common
CPU: 0 PID: 1313 Comm: vhost-1306 Not tainted 4.14.183-1-MANJARO #1
Hardware name: ASUS All Series/CS-B, BIOS 3602 03/26/2018
task: ffff90a042548000 task.stack: ffff9c4e82b4c000
RIP: 0010:__cgroup_bpf_run_filter_skb+0xca/0x1b0
RSP: 0018:ffff9c4e82b4f9a8 EFLAGS: 00010296
RAX: ffff909fa973804e RBX: ffff909efbb6d800 RCX: 0000000000000001
RDX: ffff909fa973804e RSI: ffff909efbb6d800 RDI: ffff90a06c0a2000
RBP: 0000000000000014 R08: 0000000000000001 R09: ffff90a06c0a2000
R10: 000000000000af02 R11: 000000000300a8c0 R12: 0000200000000000
R13: 0000000000000000 R14: 0000000000000014 R15: ffff909fa973804e
FS: 0000000000000000(0000) GS:ffff90a09fa00000(0000) knlGS:0000000000000000
BUG: unable to handle kernel paging request at 0000200000000002
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000000002 CR3: 00000003c2438001 CR4: 00000000001626f0
Call Trace:
sk_filter_trim_cap+0xd1/0x1a0
tcp_v4_rcv+0x921/0xbc0
? ip_local_deliver+0xbf/0x120
IP: __cgroup_bpf_run_filter_skb+0xca/0x1b0
ip_local_deliver_finish+0x66/0x200
PGD 0 P4D 0
__netif_receive_skb_core+0x35e/0xb40
? nf_hook_slow+0x3f/0xb0
netif_receive_skb_internal+0x4b/0x130
Oops: 0000 [#1] PREEMPT SMP PTI
br_handle_frame_finish+0x148/0x510 [bridge]
? try_to_wake_up+0x54/0x4a0
? br_handle_frame_finish+0x510/0x510 [bridge]
br_handle_frame+0x146/0x330 [bridge]
__netif_receive_skb_core+0x3e9/0xb40
? __skb_get_hash_symmetric+0x74/0xc0
netif_receive_skb_internal+0x4b/0x130
tun_get_user+0x956/0xf00 [tun]
? __switch_to_asm+0x35/0x70
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
? __switch_to_asm+0x41/0x70
tun_sendmsg+0x60/0x90 [tun]
handle_tx+0x360/0x5f0 [vhost_net]
vhost_worker+0xa7/0x100 [vhost]
kthread+0x102/0x140
? vhost_dev_reset_owner+0x50/0x50 [vhost]
? kthread_create_on_node+0x60/0x60
ret_from_fork+0x35/0x40
Code: 00 00 48 03 93 d0 00 00 00 4c 8b 6b 18 48 89 6b 18 49 89 c6 49 29
d6 44 01 b3 80 00 00 00 44 89 f5 48 29 e8 48 89 83 d8 00 00 00 <41> f6
44 24 02 08 75 7c 49 8b 44 24 28 49 8d 74 24 30 48 89 df
Modules linked in: rpcsec_gss_krb5 vhost_net vhost tap tun
ebtable_filter ebtables devlink ip6table_filter ip6_tables
iptable_filter fuse netconsole bridge stp llc nct6775 hwmon_vid
nls_iso8859_1 nls_cp437 vfat fat input_leds joydev mousedev
snd_hda_codec_hdmi eeepc_wmi iTCO_wdt asus_wmi mei_wdt sparse_keymap
rfkill intel_rapl iTCO_vendor_support led_class wmi_bmof
x86_pkg_temp_thermal intel_powerclamp coretemp evdev mac_hid kvm_intel
i915 snd_hda_codec_realtek snd_hda_codec_generic kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_hda_intel
aesni_intel aes_x86_64 crypto_simd glue_helper cryptd i2c_algo_bit
snd_hda_codec intel_cstate drm_kms_helper pcspkr snd_hda_core
intel_rapl_perf snd_hwdep e1000e snd_pcm r8169 i2c_i801 intel_gtt mii
syscopyarea snd_timer sysfillrect
sysimgblt snd ptp lpc_ich mei_me fb_sys_fops soundcore shpchp pps_core
mei wmi thermal fan pcc_cpufreq video button sch_fq_codel nfsd
auth_rpcgss oid_registry drm nfs_acl lockd grace agpgart sunrpc
ip_tables x_tables ext4 crc16 mbcache jbd2 fscrypto dm_thin_pool
dm_persistent_data libcrc32c crc32c_generic dm_bio_prison dm_bufio
hid_generic hid_logitech_hidpp dm_mod hid_logitech_dj usbhid hid sr_mod
sd_mod cdrom ahci libahci ehci_pci xhci_pci ehci_hcd libata xhci_hcd
crc32c_intel scsi_mod usbcore usb_common
RIP: __cgroup_bpf_run_filter_skb+0xca/0x1b0 RSP: ffff9c4e82b4f9a8
CR2: 0000200000000002
---[ end trace cb04f0196a7eba73 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x3a000000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
CPU: 0 PID: 1313 Comm: vhost-1306 Not tainted 4.14.183-1-MANJARO #1
---[ end Kernel panic - not syncing: Fatal exception in interrupt
Hardware name: ASUS All Series/CS-B, BIOS 3602 03/26/2018
task: ffff90a042548000 task.stack: ffff9c4e82b4c000
RIP: 0010:__cgroup_bpf_run_filter_skb+0xca/0x1b0
RSP: 0018:ffff9c4e82b4f9a8 EFLAGS: 00010296
RAX: ffff909fa973804e RBX: ffff909efbb6d800 RCX: 0000000000000001
RDX: ffff909fa973804e RSI: ffff909efbb6d800 RDI: ffff90a06c0a2000
RBP: 0000000000000014 R08: 0000000000000001 R09: ffff90a06c0a2000
R10: 000000000000af02 R11: 000000000300a8c0 R12: 0000200000000000
R13: 0000000000000000 R14: 0000000000000014 R15: ffff909fa973804e
FS: 0000000000000000(0000) GS:ffff90a09fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000000002 CR3: 00000003c2438001 CR4: 00000000001626f0
Call Trace:
sk_filter_trim_cap+0xd1/0x1a0
tcp_v4_rcv+0x921/0xbc0
? ip_local_deliver+0xbf/0x120
ip_local_deliver_finish+0x66/0x200
__netif_receive_skb_core+0x35e/0xb40
? nf_hook_slow+0x3f/0xb0
netif_receive_skb_internal+0x4b/0x130
br_handle_frame_finish+0x148/0x510 [bridge]
? try_to_wake_up+0x54/0x4a0
? br_handle_frame_finish+0x510/0x510 [bridge]
br_handle_frame+0x146/0x330 [bridge]
__netif_receive_skb_core+0x3e9/0xb40
? __skb_get_hash_symmetric+0x74/0xc0
netif_receive_skb_internal+0x4b/0x130
tun_get_user+0x956/0xf00 [tun]
? __switch_to_asm+0x35/0x70
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
? __switch_to_asm+0x41/0x70
tun_sendmsg+0x60/0x90 [tun]
handle_tx+0x360/0x5f0 [vhost_net]
vhost_worker+0xa7/0x100 [vhost]
kthread+0x102/0x140
? vhost_dev_reset_owner+0x50/0x50 [vhost]
? kthread_create_on_node+0x60/0x60
ret_from_fork+0x35/0x40
Code: 00 00 48 03 93 d0 00 00 00 4c 8b 6b 18 48 89 6b 18 49 89 c6 49 29
d6 44 01 b3 80 00 00 00 44 89 f5 48 29 e8 48 89 83 d8 00 00 00 <41> f6
44 24 02 08 75 7c 49 8b 44 24 28 49 8d 74 24 30 48 89 df
RIP: __cgroup_bpf_run_filter_skb+0xca/0x1b0 RSP: ffff9c4e82b4f9a8
CR2: 0000200000000002
---[ end trace cb04f0196a7eba73 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x3a000000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception in interrupt