Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

actually I'm getting random memory corruption related crashes after
updating to 6.10.y. My expectation is that it relates to this issue:
https://bugzilla.kernel.org/show_bug.cgi?id=219154
It looks like it is almost 1 month ago already from the last comment
there, However the patches fixing the regression are not reverted from
the 6.10.y tree which surprises me.

I will try to revert them from our builds and see if it helps to avoid
random daily happening crashes.

Best

pá 6. 9. 2024 v 8:20 odesílatel Jaroslav Pulchart
<jaroslav.pulchart@xxxxxxxxxxxx> napsal:
>
> Hello,
>
> My virtual machine crashed with the message
> "0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]". See the full log
> below.
>
> I did two changes:
> * Updated my VM packages (kernel from 6.9.5 to 6.10.7) on the VM
> * enabled "packed virtqueues" by libvirt on host
> and it happens after a few hours of uptime.
>
> Any hint how to prevent it or fix this issue?
>
> [52890.265362] BUG: unable to handle page fault for address: ffff9b94c480000c
> [52890.266264] #PF: supervisor write access in kernel mode
> [52890.266814] #PF: error_code(0x000b) - reserved bit violation
> [52890.267299] PGD 4c3c01067 P4D 4c3c01067 PUD 103be6063 PMD 7849cc063
> PTE 7a1dd28f4e77cee7
> [52890.267926] Oops: Oops: 000b [#1] PREEMPT SMP NOPTI
> [52890.268372] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G            E
>      6.10.7-1.gdc.el9.x86_64 #1
> [52890.269007] Hardware name: RDO OpenStack Compute/RHEL, BIOS
> edk2-20240524-1.el9 05/24/2024
> [52890.269853] RIP: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
> [52890.270349] Code: e7 e8 85 ef ff ff 49 c7 84 24 80 05 00 00 00 00
> 00 00 41 0f b7 84 24 ac 02 00 00 48 8d 73 10 45 31 c0 b9 02 00 00 00
> 8d 50 f0 <66> 89 53 0c 49 8b 3c 24 0f b7 d2 e8 f1 04 9d cb 49 8b 3c 24
> 48 89
> [52890.272173] RSP: 0018:ffffb5dac64f0d60 EFLAGS: 00010246
> [52890.272637] RAX: 0000000000001000 RBX: ffff9b94c4800000 RCX: 0000000000000002
> [52890.273209] RDX: 0000000000000ff0 RSI: ffff9b94c4800010 RDI: ffffe06380000000
> [52890.273772] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffe0639e120008
> [52890.274341] R10: 0000000000000000 R11: 00000000000096d0 R12: ffff9b8ebe621800
> [52890.274898] R13: ffff9b8ebe621800 R14: 0000000000001000 R15: 0000000000000000
> [52890.275476] FS:  0000000000000000(0000) GS:ffff9b9ba3d00000(0000)
> knlGS:0000000000000000
> [52890.276087] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [52890.276592] CR2: ffff9b94c480000c CR3: 00000004c3b1c006 CR4: 0000000000770ef0
> [52890.277167] PKRU: 55555554
> [52890.277524] Call Trace:
> [52890.277869]  <IRQ>
> [52890.278198]  ? __die+0x20/0x70
> [52890.278580]  ? page_fault_oops+0x75/0x170
> [52890.279009]  ? exc_page_fault+0xbe/0x160
> [52890.279441]  ? asm_exc_page_fault+0x22/0x30
> [52890.279881]  ? virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
> [52890.280377]  try_fill_recv+0x22c/0x440 [virtio_net]
> [52890.280848]  virtnet_receive+0x1ce/0x230 [virtio_net]
> [52890.281334]  virtnet_poll+0x179/0x3a0 [virtio_net]
> [52890.281804]  __napi_poll+0x29/0x1b0
> [52890.282222]  net_rx_action+0x2b5/0x390
> [52890.282641]  ? _raw_spin_unlock_irqrestore+0xa/0x30
> [52890.283118]  handle_softirqs+0xd3/0x2b0
> [52890.283550]  __irq_exit_rcu+0x9b/0xc0
> [52890.283970]  common_interrupt+0x7f/0xa0
> [52890.284409]  </IRQ>
> [52890.284749]  <TASK>
> [52890.285084]  asm_common_interrupt+0x22/0x40
> [52890.285526] RIP: 0010:default_idle+0xb/0x20
> [52890.285953] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 90
> 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d c3 92 30
> 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> 00 90
> [52890.287332] RSP: 0018:ffffb5dac436fec0 EFLAGS: 00000206
> [52890.287814] RAX: 000000000000000e RBX: ffff9b94c320ce00 RCX: 0000000103227730
> [52890.288395] RDX: 000000000000000e RSI: 0000000000000082 RDI: 000000002f3198a4
> [52890.288962] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
> [52890.289533] R10: 00000000000002cd R11: 0000000000000000 R12: 0000000000000000
> [52890.290092] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [52890.290657]  default_idle_call+0x2c/0xf0
> [52890.291053]  cpuidle_idle_call+0x109/0x120
> [52890.291464]  do_idle+0x76/0xb0
> [52890.291813]  cpu_startup_entry+0x25/0x30
> [52890.292269]  start_secondary+0x113/0x130
> [52890.292812]  common_startup_64+0x13e/0x141
> [52890.293279]  </TASK>
> [52890.293615] Modules linked in: mptcp_diag(E) xsk_diag(E)
> raw_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) tcp_diag(E)
> udp_diag(E) inet_diag(E) nf_conntrack(E) nf_defrag_ipv6(E)
> nf_defrag_ipv4(E) binfmt_misc(E) zram(E) tls(E) isofs(E) vfat(E)
> fat(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E)
> i2c_i801(E) virtio_net(E) net_failover(E) failover(E) virtio_gpu(E)
> i2c_smbus(E) dimlib(E) virtio_balloon(E) virtio_dma_buf(E) fuse(E)
> ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E)
> crct10dif_pclmul(E) libahci(E) crc32_pclmul(E) polyval_clmulni(E)
> polyval_generic(E) libata(E) ghash_clmulni_intel(E) sha512_ssse3(E)
> virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E)
> raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E)
> dm_region_hash(E) dm_log(E) dm_mod(E)
> [52890.293657] Unloaded tainted modules: edac_mce_amd(E):1
> amd_atl(E):2 padlock_aes(E):3
> [52890.299716] CR2: ffff9b94c480000c
> [52890.300092] ---[ end trace 0000000000000000 ]---
> [52890.300101] Oops: general protection fault, probably for
> non-canonical address 0x86304c8ed4b709b3: 0000 [#2] PREEMPT SMP NOPTI
> [52890.300336] RIP: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]
> [52890.301177] CPU: 10 PID: 51 Comm: ksoftirqd/10 Tainted: G      D
>  E      6.10.7-1.gdc.el9.x86_64 #1
> [52890.301472] Code: e7 e8 85 ef ff ff 49 c7 84 24 80 05 00 00 00 00
> 00 00 41 0f b7 84 24 ac 02 00 00 48 8d 73 10 45 31 c0 b9 02 00 00 00
> 8d 50 f0 <66> 89 53 0c 49 8b 3c 24 0f b7 d2 e8 f1 04 9d cb 49 8b 3c 24
> 48 89
> [52890.302137] Hardware name: RDO OpenStack Compute/RHEL, BIOS
> edk2-20240524-1.el9 05/24/2024
> [52890.303074] RSP: 0018:ffffb5dac64f0d60 EFLAGS: 00010246
> [52890.303689] RIP: 0010:put_cpu_partial+0x15/0x70
> [52890.303895]
> [52890.304291] Code: 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90
> 90 90 90 90 90 0f 1f 44 00 00 9c 59 fa 48 8b 07 65 4c 8b 40 18 4d 85
> c0 74 54 <41> 8b 40 18 85 d2 75 22 83 c0 01 89 46 18 4c 89 46 10 48 8b
> 07 65
> [52890.304556] RAX: 0000000000001000 RBX: ffff9b94c4800000 RCX: 0000000000000002
> [52890.304680] RSP: 0018:ffffb5dac6473d48 EFLAGS: 00010082
> [52890.305597] RDX: 0000000000000ff0 RSI: ffff9b94c4800010 RDI: ffffe06380000000
> [52890.306115]
> [52890.306384] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffe0639e120008
> [52890.306910] RAX: 00003a3f1c003ff0 RBX: ffff9b99322a16d0 RCX: 0000000000000246
> [52890.307010] R10: 0000000000000000 R11: 00000000000096d0 R12: ffff9b8ebe621800
> [52890.307574] RDX: 0000000000000001 RSI: ffffe063afc8a800 RDI: ffff9b94cc48fc00
> [52890.307927] R13: ffff9b8ebe621800 R14: 0000000000001000 R15: 0000000000000000
> [52890.308450] RBP: ffffb5dac6473da0 R08: 86304c8ed4b7099b R09: 00000000001c001b
> [52890.309006] FS:  0000000000000000(0000) GS:ffff9b9ba3d00000(0000)
> knlGS:0000000000000000
> [52890.309543] R10: 0000000000040000 R11: 0000000000000001 R12: ffffe063afc8a840
> [52890.310127] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [52890.310722] R13: ffffe063afc8a800 R14: ffff9b94cc48fc00 R15: ffff9b9ba3b36100
> [52890.311314] CR2: ffff9b94c480000c CR3: 00000004c3b1c006 CR4: 0000000000770ef0
> [52890.311735] FS:  0000000000000000(0000) GS:ffff9b9ba3b00000(0000)
> knlGS:0000000000000000
> [52890.312333] PKRU: 55555554
> [52890.312850] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [52890.313298] Kernel panic - not syncing: Fatal exception in interrupt
> [52891.350776] Shutting down cpus with NMI
> [52891.358926] Kernel Offset: 0xae00000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [52891.359713] ---[ end Kernel panic - not syncing: Fatal exception in
> interrupt ]---
>
> Best,
> Jaroslav Pulchart



-- 
Jaroslav Pulchart
Sr. Principal SW Engineer
GoodData





[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux