Hello, actually I'm getting random memory corruption related crashes after updating to 6.10.y. My expectation is that it relates to this issue: https://bugzilla.kernel.org/show_bug.cgi?id=219154 It looks like it is almost 1 month ago already from the last comment there, However the patches fixing the regression are not reverted from the 6.10.y tree which surprises me. I will try to revert them from our builds and see if it helps to avoid random daily happening crashes. Best pá 6. 9. 2024 v 8:20 odesílatel Jaroslav Pulchart <jaroslav.pulchart@xxxxxxxxxxxx> napsal: > > Hello, > > My virtual machine crashed with the message > "0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net]". See the full log > below. > > I did two changes: > * Updated my VM packages (kernel from 6.9.5 to 6.10.7) on the VM > * enabled "packed virtqueues" by libvirt on host > and it happens after a few hours of uptime. > > Any hint how to prevent it or fix this issue? > > [52890.265362] BUG: unable to handle page fault for address: ffff9b94c480000c > [52890.266264] #PF: supervisor write access in kernel mode > [52890.266814] #PF: error_code(0x000b) - reserved bit violation > [52890.267299] PGD 4c3c01067 P4D 4c3c01067 PUD 103be6063 PMD 7849cc063 > PTE 7a1dd28f4e77cee7 > [52890.267926] Oops: Oops: 000b [#1] PREEMPT SMP NOPTI > [52890.268372] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G E > 6.10.7-1.gdc.el9.x86_64 #1 > [52890.269007] Hardware name: RDO OpenStack Compute/RHEL, BIOS > edk2-20240524-1.el9 05/24/2024 > [52890.269853] RIP: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] > [52890.270349] Code: e7 e8 85 ef ff ff 49 c7 84 24 80 05 00 00 00 00 > 00 00 41 0f b7 84 24 ac 02 00 00 48 8d 73 10 45 31 c0 b9 02 00 00 00 > 8d 50 f0 <66> 89 53 0c 49 8b 3c 24 0f b7 d2 e8 f1 04 9d cb 49 8b 3c 24 > 48 89 > [52890.272173] RSP: 0018:ffffb5dac64f0d60 EFLAGS: 00010246 > [52890.272637] RAX: 0000000000001000 RBX: ffff9b94c4800000 RCX: 0000000000000002 > [52890.273209] RDX: 0000000000000ff0 RSI: ffff9b94c4800010 RDI: ffffe06380000000 > [52890.273772] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffe0639e120008 > [52890.274341] R10: 0000000000000000 R11: 00000000000096d0 R12: ffff9b8ebe621800 > [52890.274898] R13: ffff9b8ebe621800 R14: 0000000000001000 R15: 0000000000000000 > [52890.275476] FS: 0000000000000000(0000) GS:ffff9b9ba3d00000(0000) > knlGS:0000000000000000 > [52890.276087] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [52890.276592] CR2: ffff9b94c480000c CR3: 00000004c3b1c006 CR4: 0000000000770ef0 > [52890.277167] PKRU: 55555554 > [52890.277524] Call Trace: > [52890.277869] <IRQ> > [52890.278198] ? __die+0x20/0x70 > [52890.278580] ? page_fault_oops+0x75/0x170 > [52890.279009] ? exc_page_fault+0xbe/0x160 > [52890.279441] ? asm_exc_page_fault+0x22/0x30 > [52890.279881] ? virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] > [52890.280377] try_fill_recv+0x22c/0x440 [virtio_net] > [52890.280848] virtnet_receive+0x1ce/0x230 [virtio_net] > [52890.281334] virtnet_poll+0x179/0x3a0 [virtio_net] > [52890.281804] __napi_poll+0x29/0x1b0 > [52890.282222] net_rx_action+0x2b5/0x390 > [52890.282641] ? _raw_spin_unlock_irqrestore+0xa/0x30 > [52890.283118] handle_softirqs+0xd3/0x2b0 > [52890.283550] __irq_exit_rcu+0x9b/0xc0 > [52890.283970] common_interrupt+0x7f/0xa0 > [52890.284409] </IRQ> > [52890.284749] <TASK> > [52890.285084] asm_common_interrupt+0x22/0x40 > [52890.285526] RIP: 0010:default_idle+0xb/0x20 > [52890.285953] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 90 > 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d c3 92 30 > 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 > 00 90 > [52890.287332] RSP: 0018:ffffb5dac436fec0 EFLAGS: 00000206 > [52890.287814] RAX: 000000000000000e RBX: ffff9b94c320ce00 RCX: 0000000103227730 > [52890.288395] RDX: 000000000000000e RSI: 0000000000000082 RDI: 000000002f3198a4 > [52890.288962] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 > [52890.289533] R10: 00000000000002cd R11: 0000000000000000 R12: 0000000000000000 > [52890.290092] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [52890.290657] default_idle_call+0x2c/0xf0 > [52890.291053] cpuidle_idle_call+0x109/0x120 > [52890.291464] do_idle+0x76/0xb0 > [52890.291813] cpu_startup_entry+0x25/0x30 > [52890.292269] start_secondary+0x113/0x130 > [52890.292812] common_startup_64+0x13e/0x141 > [52890.293279] </TASK> > [52890.293615] Modules linked in: mptcp_diag(E) xsk_diag(E) > raw_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) tcp_diag(E) > udp_diag(E) inet_diag(E) nf_conntrack(E) nf_defrag_ipv6(E) > nf_defrag_ipv4(E) binfmt_misc(E) zram(E) tls(E) isofs(E) vfat(E) > fat(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E) > i2c_i801(E) virtio_net(E) net_failover(E) failover(E) virtio_gpu(E) > i2c_smbus(E) dimlib(E) virtio_balloon(E) virtio_dma_buf(E) fuse(E) > ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) > crct10dif_pclmul(E) libahci(E) crc32_pclmul(E) polyval_clmulni(E) > polyval_generic(E) libata(E) ghash_clmulni_intel(E) sha512_ssse3(E) > virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E) > raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E) > dm_region_hash(E) dm_log(E) dm_mod(E) > [52890.293657] Unloaded tainted modules: edac_mce_amd(E):1 > amd_atl(E):2 padlock_aes(E):3 > [52890.299716] CR2: ffff9b94c480000c > [52890.300092] ---[ end trace 0000000000000000 ]--- > [52890.300101] Oops: general protection fault, probably for > non-canonical address 0x86304c8ed4b709b3: 0000 [#2] PREEMPT SMP NOPTI > [52890.300336] RIP: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] > [52890.301177] CPU: 10 PID: 51 Comm: ksoftirqd/10 Tainted: G D > E 6.10.7-1.gdc.el9.x86_64 #1 > [52890.301472] Code: e7 e8 85 ef ff ff 49 c7 84 24 80 05 00 00 00 00 > 00 00 41 0f b7 84 24 ac 02 00 00 48 8d 73 10 45 31 c0 b9 02 00 00 00 > 8d 50 f0 <66> 89 53 0c 49 8b 3c 24 0f b7 d2 e8 f1 04 9d cb 49 8b 3c 24 > 48 89 > [52890.302137] Hardware name: RDO OpenStack Compute/RHEL, BIOS > edk2-20240524-1.el9 05/24/2024 > [52890.303074] RSP: 0018:ffffb5dac64f0d60 EFLAGS: 00010246 > [52890.303689] RIP: 0010:put_cpu_partial+0x15/0x70 > [52890.303895] > [52890.304291] Code: 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 > 90 90 90 90 90 0f 1f 44 00 00 9c 59 fa 48 8b 07 65 4c 8b 40 18 4d 85 > c0 74 54 <41> 8b 40 18 85 d2 75 22 83 c0 01 89 46 18 4c 89 46 10 48 8b > 07 65 > [52890.304556] RAX: 0000000000001000 RBX: ffff9b94c4800000 RCX: 0000000000000002 > [52890.304680] RSP: 0018:ffffb5dac6473d48 EFLAGS: 00010082 > [52890.305597] RDX: 0000000000000ff0 RSI: ffff9b94c4800010 RDI: ffffe06380000000 > [52890.306115] > [52890.306384] RBP: 0000000000001000 R08: 0000000000000000 R09: ffffe0639e120008 > [52890.306910] RAX: 00003a3f1c003ff0 RBX: ffff9b99322a16d0 RCX: 0000000000000246 > [52890.307010] R10: 0000000000000000 R11: 00000000000096d0 R12: ffff9b8ebe621800 > [52890.307574] RDX: 0000000000000001 RSI: ffffe063afc8a800 RDI: ffff9b94cc48fc00 > [52890.307927] R13: ffff9b8ebe621800 R14: 0000000000001000 R15: 0000000000000000 > [52890.308450] RBP: ffffb5dac6473da0 R08: 86304c8ed4b7099b R09: 00000000001c001b > [52890.309006] FS: 0000000000000000(0000) GS:ffff9b9ba3d00000(0000) > knlGS:0000000000000000 > [52890.309543] R10: 0000000000040000 R11: 0000000000000001 R12: ffffe063afc8a840 > [52890.310127] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [52890.310722] R13: ffffe063afc8a800 R14: ffff9b94cc48fc00 R15: ffff9b9ba3b36100 > [52890.311314] CR2: ffff9b94c480000c CR3: 00000004c3b1c006 CR4: 0000000000770ef0 > [52890.311735] FS: 0000000000000000(0000) GS:ffff9b9ba3b00000(0000) > knlGS:0000000000000000 > [52890.312333] PKRU: 55555554 > [52890.312850] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [52890.313298] Kernel panic - not syncing: Fatal exception in interrupt > [52891.350776] Shutting down cpus with NMI > [52891.358926] Kernel Offset: 0xae00000 from 0xffffffff81000000 > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [52891.359713] ---[ end Kernel panic - not syncing: Fatal exception in > interrupt ]--- > > Best, > Jaroslav Pulchart -- Jaroslav Pulchart Sr. Principal SW Engineer GoodData