[Bug 209253] Loss of connectivity on guest after important host <-> guest traffic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=209253

Martin (kdev@xxxxxxxxx) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kdev@xxxxxxxxx

--- Comment #2 from Martin (kdev@xxxxxxxxx) ---
I am having problems possibly related to Clement's, and likely related to
Ian's. 
I am running KVM on a dual Nvidia GPU machine, passing one GPU through to the
KVM guest (Ubuntu 20.04.1).
The setup ran stable for quite a while on Fedora 31 (5.7.15-100). After
upgrading to Fedora 32 (5.8.13), the system tends to run well for a few hours
after rebooting, and then produces oopses (below). After the oops, VirtIO
drives, VirtIO network cards, and PCIe passthrough tend to hang indefinitely
within minutes of rebooting the guest, usually making the guest inoperable
(unless only non-VirtIO devices and no GPU passthrough are used). Rebooting the
host makes things work again for a few hours until the next oops happens. I am
on 5.8.14 now with the same problem.

Oops 1 (I saw this twice, once on 5.8.13-200 and once on 5.8.14-200):

WARNING: CPU: 28 PID: 17651 at fs/eventfd.c:74 eventfd_signal+0x88/0xa0
Modules linked in: vhost_net vhost tap vhost_iotlb v4l2loopback(OE) xt_nat
xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nft_objref
nf_conntrack_tftp tun bridge stp llc evdi(OE) vboxnetadp(OE) vboxnetflt(OE)
vboxdrv(OE) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables
ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter sunrpc ucsi_ccg typec_ucsi
nvidia_drm(POE) typec nvidia_modeset(POE) snd_hda_codec_realtek nvidia_uvm(OE)
snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb edac_mce_amd btrtl
btbcm snd_hda_intel uvcvideo iwlmvm snd_intel_dspcfg kvm_amd btintel
snd_usb_audio snd_hda_codec videobuf2_vmalloc videobuf2_memops snd_usbmidi_lib
mac80211
 nvidia(POE) kvm bluetooth snd_hda_core videobuf2_v4l2 snd_rawmidi snd_hwdep
videobuf2_common libarc4 snd_seq iwlwifi videodev joydev rapl snd_seq_device
ecdh_generic wmi_bmof pcspkr cfg80211 mc ecc snd_pcm drm_kms_helper snd_timer
sp5100_tco k10temp snd i2c_piix4 rfkill soundcore cec i2c_nvidia_gpu gpio_amdpt
gpio_generic acpi_cpufreq drm ip_tables dm_crypt hid_lenovo mxm_wmi
crct10dif_pclmul crc32_pclmul crc32c_intel nvme ghash_clmulni_intel nvme_core
igb wacom ccp uas dca usb_storage i2c_algo_bit wmi pinctrl_amd vfio_pci
irqbypass vfio_virqfd vfio_iommu_type1 vfio fuse
CPU: 28 PID: 17651 Comm: CPU 5/KVM Tainted: P           OE    
5.8.13-200.fc32.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE
EX-CF, BIOS F12 12/11/2019
RIP: 0010:eventfd_signal+0x88/0xa0
Code: 03 00 00 00 4c 89 f7 e8 26 16 db ff 65 ff 0d 3f f3 ca 4b 4c 89 ee 4c 89
f7 e8 34 8e 7f 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45 31 e4 5b 5d 4c
89 e0 41 5c 41 5d 41 5e c3 0f 1f 80 00 00
RSP: 0018:ffffab10c8db7bb0 EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff9a71e16b8000 RCX: 0000000000000004
RDX: 00000000c8088704 RSI: 0000000000000001 RDI: ffff9a8335656580
RBP: ffffab10c8db7c18 R08: ffff9a72f7d120a0 R09: 00000000c8088708
R10: 0000000000000000 R11: 0000000000000014 R12: 0000000000000001
R13: ffff9a72a3153448 R14: ffff9a72f7d120a0 R15: ffff9a72a3153448
FS:  0000000000000000(0000) GS:ffff9a7e7f280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f64e403f024 CR3: 000000041b5f4000 CR4: 00000000003406e0
Call Trace:
 ioeventfd_write+0x51/0x80 [kvm]
 __kvm_io_bus_write+0x88/0xb0 [kvm]
 kvm_io_bus_write+0x43/0x60 [kvm]
 write_mmio+0x70/0xf0 [kvm]
 emulator_read_write_onepage+0x11e/0x330 [kvm]
 emulator_read_write+0xca/0x180 [kvm]
 segmented_write.isra.0+0x4a/0x60 [kvm]
 x86_emulate_insn+0x850/0xe60 [kvm]
 x86_emulate_instruction+0x2c7/0x780 [kvm]
 ? kvm_set_cr8+0x1e/0x40 [kvm]
 kvm_arch_vcpu_ioctl_run+0xeb9/0x1770 [kvm]
 ? x86_pmu_enable+0x106/0x2f0
 ? __switch_to_xtra+0x495/0x500
 kvm_vcpu_ioctl+0x209/0x590 [kvm]
 ksys_ioctl+0x82/0xc0
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x4d/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f5f6a84f3bb
Code: 0f 1e fa 48 8b 05 dd aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
c3 48 8b 0d ad aa 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007f5f527fb668 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000055c459b6f1f0 RCX: 00007f5f6a84f3bb
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000022
RBP: 00007f5f6bcca000 R08: 000055c45750abf0 R09: 000000003b9aca00
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007f5f6bccb004 R14: 0000000000000000 R15: 000055c4579a4a00

Oops 2 (saw this once on 5.8.14-200):

WARNING: CPU: 24 PID: 0 at fs/eventfd.c:74 eventfd_signal+0x88/0xa0
Modules linked in: v4l2loopback(OE) nfnetlink_queue nfnetlink_log vhost_net
vhost tap vhost_iotlb xt_nat xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
nf_nat_tftp nft_objref nf_conntrack_tftp tun bridge stp llc vboxnetadp(OE)
vboxnetflt(OE) vboxdrv(OE) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat
nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter sunrpc nvidia_drm(POE)
nvidia_modeset(POE) iwlmvm nvidia_uvm(OE) snd_hda_codec_realtek ucsi_ccg
typec_ucsi mac80211 typec edac_mce_amd snd_hda_codec_generic ledtrig_audio
snd_hda_codec_hdmi uvcvideo btusb btrtl btbcm nvidia(POE) snd_hda_intel
videobuf2_vmalloc libarc4 kvm_amd btintel videobuf2_memops snd_intel_dspcfg
 snd_hda_codec videobuf2_v4l2 kvm bluetooth videobuf2_common snd_usb_audio
iwlwifi snd_hda_core videodev snd_usbmidi_lib snd_hwdep snd_seq snd_rawmidi
joydev rapl snd_seq_device ecdh_generic mc pcspkr wmi_bmof ecc cfg80211 snd_pcm
drm_kms_helper snd_timer snd sp5100_tco i2c_piix4 k10temp rfkill soundcore cec
i2c_nvidia_gpu gpio_amdpt gpio_generic acpi_cpufreq drm ip_tables dm_crypt
mxm_wmi crct10dif_pclmul crc32_pclmul crc32c_intel nvme ghash_clmulni_intel igb
nvme_core wacom uas dca hid_lenovo ccp usb_storage i2c_algo_bit wmi pinctrl_amd
vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio fuse
CPU: 24 PID: 0 Comm: swapper/24 Tainted: P           OE    
5.8.14-200.fc32.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE
EX-CF, BIOS F12 12/11/2019
RIP: 0010:eventfd_signal+0x88/0xa0
Code: 03 00 00 00 4c 89 f7 e8 a6 14 db ff 65 ff 0d bf f1 ca 78 4c 89 ee 4c 89
f7 e8 b4 9c 7f 00 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45 31 e4 5b 5d 4c
89 e0 41 5c 41 5d 41 5e c3 0f 1f 80 00 00
RSP: 0018:ffffb5e2c6d2cf38 EFLAGS: 00010002
RAX: 0000000000000001 RBX: ffff894a2a1f1480 RCX: 000000000000001f
RDX: ffff89423920ce00 RSI: 0000000000000001 RDI: ffff894929afc580
RBP: ffff89423920cea4 R08: ffffb5e2c6d2cff8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000a1
R13: 0000000000000000 R14: ffffb5e2c6d2cfb4 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff89423f180000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055ae44aa8990 CR3: 0000000797204000 CR4: 00000000003406e0
Call Trace:
 <IRQ>
 vfio_msihandler+0x12/0x20 [vfio_pci]
 __handle_irq_event_percpu+0x42/0x180
 handle_irq_event+0x47/0x8a
 handle_edge_irq+0x87/0x220
 asm_call_irq_on_stack+0x12/0x20
 </IRQ>
 common_interrupt+0xb2/0x140
 asm_common_interrupt+0x1e/0x40
RIP: 0010:cpuidle_enter_state+0xb6/0x3f0
Code: 90 a5 6b 78 e8 5b be 7b ff 49 89 c7 0f 1f 44 00 00 31 ff e8 2c d7 7b ff
80 7c 24 0f 00 0f 85 d4 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 e0 01 00
00 49 63 d4 4c 2b 7c 24 10 48 8d 04 52 48
RSP: 0018:ffffb5e2c0337e88 EFLAGS: 00000246
RAX: ffff89423f1aa2c0 RBX: ffff89423366e400 RCX: 000000000000001f
RDX: 0000000000000000 RSI: 000000002abf3055 RDI: 0000000000000000
RBP: ffffffff88b78940 R08: 00000a86556fd237 R09: 0000000000000018
R10: 0000000000002358 R11: 0000000000000781 R12: 0000000000000002
R13: ffff89423366e400 R14: 0000000000000002 R15: 00000a86556fd237
 ? cpuidle_enter_state+0xa4/0x3f0
 cpuidle_enter+0x29/0x40
 do_idle+0x1d5/0x2a0
 cpu_startup_entry+0x19/0x20
 start_secondary+0x144/0x170
 secondary_startup_64+0xb6/0xc0

-- 
You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux