Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So far:

1/ I was able to "do a reproducer" and hit the "random memory
corruption" issue with vanila 6.10.10 in our setup in ~28m of uptime
see attached 6.10.10-1.gdc.el9.x86_64.log.
2/ I reverted these commits
"virtio_net: rx remove premapped failover code":
defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
"virtio_net: big mode skip the unmap check":
a377ae542d8d0a20a3173da3bbba72e045bea7a9
"virtio_ring: enable premapped mode whatever use_dma_api":
f9dac92ba9081062a6477ee015bd3b8c5914efc4
in our next build and so far the environment is stable and not
crashing under same conditions like the previous crash.


pá 13. 9. 2024 v 10:51 odesílatel Linux regression tracking (Thorsten
Leemhuis) <regressions@xxxxxxxxxxxxx> napsal:
>
> On 13.09.24 10:42, Xuan Zhuo wrote:
> > On Fri, 13 Sep 2024 10:26:57 +0200, "Linux regression tracking (Thorsten Leemhuis)" <regressions@xxxxxxxxxxxxx> wrote:
> >> [CCing a few people that know more about this stuff than I do]
> >>
> >> On 13.09.24 09:50, Jaroslav Pulchart wrote:
> >>>
> >>> actually I'm getting random memory corruption related crashes after
> >>> updating to 6.10.y. My expectation is that it relates to this issue:
> >>> https://bugzilla.kernel.org/show_bug.cgi?id=219154
> >>> It looks like it is almost 1 month ago
> >>
> >> A lot of developer ignore bugzilla.
> >>
> >>> already from the last comment
> >>> there, However the patches fixing the regression are not reverted from
> >>> the 6.10.y tree which surprises me.
> >>>
> >>> I will try to revert them from our builds and see if it helps to avoid
> >>> random daily happening crashes.
> >>
> >> Not my area of expertise, but to me it sounds like the problem will be
> >> resolved my "Revert "virtio_net: rx enable premapped mode by default"":
> >> https://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@xxxxxxxxxxxxxxxxx/
> >
> > YES. That is merged into net.
>
> Well, yes, but TWIMC to avoid confusion, it's already one step further,
> as mentioned:
>
> >> That set just landed in mainline.
>
> See
> https://git.kernel.org/torvalds/c/48aa361c5db0b380c2b75c24984c0d3e7c1e8c09
> or
> https://git.kernel.org/torvalds/c/111fc9f517cb293c4213673733b980123c3b0209
>
> Ciao, Thorsten



-- 
Jaroslav Pulchart
Sr. Principal SW Engineer
GoodData
[ 2224.743780] Oops: stack segment: 0000 [#1] PREEMPT SMP NOPTI
[ 2224.744605] CPU: 1 PID: 52 Comm: kswapd0 Tainted: G            E      6.10.10-1.gdc.el9.x86_64 #1
[ 2224.745375] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20240524-1.el9 05/24/2024
[ 2224.746094] RIP: 0010:refill_obj_stock+0x40/0x170
[ 2224.746629] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
[ 2224.748241] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
[ 2224.748803] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
[ 2224.749449] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
[ 2224.750082] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
[ 2224.750720] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
[ 2224.751359] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
[ 2224.752183] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
[ 2224.752952] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2224.753593] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
[ 2224.754271] PKRU: 55555554
[ 2224.754697] Call Trace:
[ 2224.755112]  <IRQ>
[ 2224.755509]  ? die+0x33/0x90
[ 2224.755949]  ? do_trap+0xd9/0x100
[ 2224.756418]  ? do_error_trap+0x65/0x80
[ 2224.756903]  ? exc_stack_segment+0x35/0x50
[ 2224.757417]  ? asm_exc_stack_segment+0x22/0x30
[ 2224.757999]  ? rcu_do_batch+0x1a7/0x530
[ 2224.758549]  ? refill_obj_stock+0x40/0x170
[ 2224.759125]  __memcg_slab_free_hook+0xb0/0x140
[ 2224.759723]  kmem_cache_free+0x3b2/0x3e0
[ 2224.760292]  ? rcu_do_batch+0x1a7/0x530
[ 2224.760845]  rcu_do_batch+0x1a7/0x530
[ 2224.761399]  ? rcu_do_batch+0x13b/0x530
[ 2224.761950]  rcu_core+0x256/0x420
[ 2224.762475]  ? ktime_get+0x34/0xc0
[ 2224.763010]  handle_softirqs+0xd3/0x2b0
[ 2224.763573]  __irq_exit_rcu+0x9b/0xc0
[ 2224.764118]  sysvec_apic_timer_interrupt+0x71/0x90
[ 2224.764738]  </IRQ>
[ 2224.765159]  <TASK>
[ 2224.765594]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 2224.766163] RIP: 0010:mem_cgroup_from_slab_obj+0x51/0x130
[ 2224.766750] Code: 01 c8 48 8b 35 58 9d 28 01 48 c1 e8 0c 48 c1 e0 06 48 01 f0 48 8b 78 08 48 89 c1 40 f6 c7 01 0f 85 cd 00 00 00 66 90 8b 41 30 <25> 00 10 00 f0 3d 00 00 00 f0 74 45 48 8b 51 38 f6 c2 01 75 15 48
[ 2224.768355] RSP: 0018:ffffa502403cfa70 EFLAGS: 00000202
[ 2224.768994] RAX: 00000000ffffefff RBX: ffff977b9fbb7000 RCX: ffffc69214c0b500
[ 2224.769747] RDX: ffff977f302d6a40 RSI: ffffc69200000000 RDI: ffffc69214c0b501
[ 2224.770504] RBP: ffff977f302d6a40 R08: ffff977f300e58c8 R09: ffff977f300e58c8
[ 2224.771246] R10: 0000000000000000 R11: ffffa502403cf900 R12: ffff977b9fbb7498
[ 2224.771974] R13: 0000000000000000 R14: ffff977b9fbb7070 R15: 0000000000000000
[ 2224.772678]  list_lru_add_obj+0x6b/0xa0
[ 2224.773158]  iput+0x1f1/0x210
[ 2224.773596]  __dentry_kill+0x71/0x170
[ 2224.774055]  shrink_dentry_list+0x67/0xe0
[ 2224.774542]  prune_dcache_sb+0x54/0x80
[ 2224.774996]  super_cache_scan+0x120/0x1c0
[ 2224.775470]  do_shrink_slab+0x134/0x350
[ 2224.775916]  shrink_slab_memcg+0x199/0x2c0
[ 2224.776387]  shrink_one+0x118/0x1b0
[ 2224.776845]  shrink_many+0x127/0x2a0
[ 2224.777314]  shrink_node+0x3d7/0x430
[ 2224.777765]  ? pick_next_task+0x5a/0xae0
[ 2224.778250]  balance_pgdat+0x29c/0x730
[ 2224.778704]  ? __try_to_del_timer_sync+0x62/0xa0
[ 2224.779227]  ? __pfx_kswapd+0x10/0x10
[ 2224.779674]  kswapd+0xf7/0x180
[ 2224.780082]  kthread+0xcc/0x100
[ 2224.780483]  ? __pfx_kthread+0x10/0x10
[ 2224.780887]  ret_from_fork+0x2d/0x50
[ 2224.781297]  ? __pfx_kthread+0x10/0x10
[ 2224.781703]  ret_from_fork_asm+0x1a/0x30
[ 2224.782118]  </TASK>
[ 2224.782451] Modules linked in: udp_diag(E) tcp_diag(E) inet_diag(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) zram(E) tls(E) isofs(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E) virtio_gpu(E) virtio_net(E) i2c_i801(E) i2c_smbus(E) net_failover(E) failover(E) dimlib(E) virtio_dma_buf(E) virtio_balloon(E) vfat(E) fat(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) libata(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[ 2224.782487] Unloaded tainted modules: amd_atl(E):2 edac_mce_amd(E):1 padlock_aes(E):3
[ 2224.787698] ---[ end trace 0000000000000000 ]---
[ 2224.788286] RIP: 0010:refill_obj_stock+0x40/0x170
[ 2224.788860] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
[ 2224.790600] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
[ 2224.791230] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
[ 2224.791924] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
[ 2224.792610] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
[ 2224.793303] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
[ 2224.793985] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
[ 2224.794681] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
[ 2224.795439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2224.796117] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
[ 2224.796887] PKRU: 55555554
[ 2224.797384] Kernel panic - not syncing: Fatal exception in interrupt
[ 2224.798304] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 2224.799190] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---


[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux