Re: 0010:virtnet_rq_alloc+0x8f/0x1b0 [virtio_net] with 6.10.7 and packed virtqueues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 13, 2024 at 11:21:11AM +0200, Jaroslav Pulchart wrote:
> So far:
> 
> 1/ I was able to "do a reproducer" and hit the "random memory
> corruption" issue with vanila 6.10.10 in our setup in ~28m of uptime
> see attached 6.10.10-1.gdc.el9.x86_64.log.
> 2/ I reverted these commits
> "virtio_net: rx remove premapped failover code":
> defd28aa5acb0fd7c15adc6bc40a8ac277d04dea
> "virtio_net: big mode skip the unmap check":
> a377ae542d8d0a20a3173da3bbba72e045bea7a9
> "virtio_ring: enable premapped mode whatever use_dma_api":
> f9dac92ba9081062a6477ee015bd3b8c5914efc4
> in our next build and so far the environment is stable and not
> crashing under same conditions like the previous crash.


Automated backport failed:

http://lore.kernel.org/all/2024091336-family-daffodil-541d@gregkh

Since you have done the revert, and actually tested it, feel free
to post, I will ack.




> 
> pá 13. 9. 2024 v 10:51 odesílatel Linux regression tracking (Thorsten
> Leemhuis) <regressions@xxxxxxxxxxxxx> napsal:
> >
> > On 13.09.24 10:42, Xuan Zhuo wrote:
> > > On Fri, 13 Sep 2024 10:26:57 +0200, "Linux regression tracking (Thorsten Leemhuis)" <regressions@xxxxxxxxxxxxx> wrote:
> > >> [CCing a few people that know more about this stuff than I do]
> > >>
> > >> On 13.09.24 09:50, Jaroslav Pulchart wrote:
> > >>>
> > >>> actually I'm getting random memory corruption related crashes after
> > >>> updating to 6.10.y. My expectation is that it relates to this issue:
> > >>> https://bugzilla.kernel.org/show_bug.cgi?id=219154
> > >>> It looks like it is almost 1 month ago
> > >>
> > >> A lot of developer ignore bugzilla.
> > >>
> > >>> already from the last comment
> > >>> there, However the patches fixing the regression are not reverted from
> > >>> the 6.10.y tree which surprises me.
> > >>>
> > >>> I will try to revert them from our builds and see if it helps to avoid
> > >>> random daily happening crashes.
> > >>
> > >> Not my area of expertise, but to me it sounds like the problem will be
> > >> resolved my "Revert "virtio_net: rx enable premapped mode by default"":
> > >> https://lore.kernel.org/all/20240820071913.68004-1-xuanzhuo@xxxxxxxxxxxxxxxxx/
> > >
> > > YES. That is merged into net.
> >
> > Well, yes, but TWIMC to avoid confusion, it's already one step further,
> > as mentioned:
> >
> > >> That set just landed in mainline.
> >
> > See
> > https://git.kernel.org/torvalds/c/48aa361c5db0b380c2b75c24984c0d3e7c1e8c09
> > or
> > https://git.kernel.org/torvalds/c/111fc9f517cb293c4213673733b980123c3b0209
> >
> > Ciao, Thorsten
> 
> 
> 
> -- 
> Jaroslav Pulchart
> Sr. Principal SW Engineer
> GoodData

> [ 2224.743780] Oops: stack segment: 0000 [#1] PREEMPT SMP NOPTI
> [ 2224.744605] CPU: 1 PID: 52 Comm: kswapd0 Tainted: G            E      6.10.10-1.gdc.el9.x86_64 #1
> [ 2224.745375] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20240524-1.el9 05/24/2024
> [ 2224.746094] RIP: 0010:refill_obj_stock+0x40/0x170
> [ 2224.746629] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
> [ 2224.748241] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
> [ 2224.748803] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
> [ 2224.749449] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
> [ 2224.750082] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
> [ 2224.750720] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
> [ 2224.751359] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
> [ 2224.752183] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
> [ 2224.752952] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2224.753593] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
> [ 2224.754271] PKRU: 55555554
> [ 2224.754697] Call Trace:
> [ 2224.755112]  <IRQ>
> [ 2224.755509]  ? die+0x33/0x90
> [ 2224.755949]  ? do_trap+0xd9/0x100
> [ 2224.756418]  ? do_error_trap+0x65/0x80
> [ 2224.756903]  ? exc_stack_segment+0x35/0x50
> [ 2224.757417]  ? asm_exc_stack_segment+0x22/0x30
> [ 2224.757999]  ? rcu_do_batch+0x1a7/0x530
> [ 2224.758549]  ? refill_obj_stock+0x40/0x170
> [ 2224.759125]  __memcg_slab_free_hook+0xb0/0x140
> [ 2224.759723]  kmem_cache_free+0x3b2/0x3e0
> [ 2224.760292]  ? rcu_do_batch+0x1a7/0x530
> [ 2224.760845]  rcu_do_batch+0x1a7/0x530
> [ 2224.761399]  ? rcu_do_batch+0x13b/0x530
> [ 2224.761950]  rcu_core+0x256/0x420
> [ 2224.762475]  ? ktime_get+0x34/0xc0
> [ 2224.763010]  handle_softirqs+0xd3/0x2b0
> [ 2224.763573]  __irq_exit_rcu+0x9b/0xc0
> [ 2224.764118]  sysvec_apic_timer_interrupt+0x71/0x90
> [ 2224.764738]  </IRQ>
> [ 2224.765159]  <TASK>
> [ 2224.765594]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> [ 2224.766163] RIP: 0010:mem_cgroup_from_slab_obj+0x51/0x130
> [ 2224.766750] Code: 01 c8 48 8b 35 58 9d 28 01 48 c1 e8 0c 48 c1 e0 06 48 01 f0 48 8b 78 08 48 89 c1 40 f6 c7 01 0f 85 cd 00 00 00 66 90 8b 41 30 <25> 00 10 00 f0 3d 00 00 00 f0 74 45 48 8b 51 38 f6 c2 01 75 15 48
> [ 2224.768355] RSP: 0018:ffffa502403cfa70 EFLAGS: 00000202
> [ 2224.768994] RAX: 00000000ffffefff RBX: ffff977b9fbb7000 RCX: ffffc69214c0b500
> [ 2224.769747] RDX: ffff977f302d6a40 RSI: ffffc69200000000 RDI: ffffc69214c0b501
> [ 2224.770504] RBP: ffff977f302d6a40 R08: ffff977f300e58c8 R09: ffff977f300e58c8
> [ 2224.771246] R10: 0000000000000000 R11: ffffa502403cf900 R12: ffff977b9fbb7498
> [ 2224.771974] R13: 0000000000000000 R14: ffff977b9fbb7070 R15: 0000000000000000
> [ 2224.772678]  list_lru_add_obj+0x6b/0xa0
> [ 2224.773158]  iput+0x1f1/0x210
> [ 2224.773596]  __dentry_kill+0x71/0x170
> [ 2224.774055]  shrink_dentry_list+0x67/0xe0
> [ 2224.774542]  prune_dcache_sb+0x54/0x80
> [ 2224.774996]  super_cache_scan+0x120/0x1c0
> [ 2224.775470]  do_shrink_slab+0x134/0x350
> [ 2224.775916]  shrink_slab_memcg+0x199/0x2c0
> [ 2224.776387]  shrink_one+0x118/0x1b0
> [ 2224.776845]  shrink_many+0x127/0x2a0
> [ 2224.777314]  shrink_node+0x3d7/0x430
> [ 2224.777765]  ? pick_next_task+0x5a/0xae0
> [ 2224.778250]  balance_pgdat+0x29c/0x730
> [ 2224.778704]  ? __try_to_del_timer_sync+0x62/0xa0
> [ 2224.779227]  ? __pfx_kswapd+0x10/0x10
> [ 2224.779674]  kswapd+0xf7/0x180
> [ 2224.780082]  kthread+0xcc/0x100
> [ 2224.780483]  ? __pfx_kthread+0x10/0x10
> [ 2224.780887]  ret_from_fork+0x2d/0x50
> [ 2224.781297]  ? __pfx_kthread+0x10/0x10
> [ 2224.781703]  ret_from_fork_asm+0x1a/0x30
> [ 2224.782118]  </TASK>
> [ 2224.782451] Modules linked in: udp_diag(E) tcp_diag(E) inet_diag(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) zram(E) tls(E) isofs(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) ccp(E) kvm(E) virtio_gpu(E) virtio_net(E) i2c_i801(E) i2c_smbus(E) net_failover(E) failover(E) dimlib(E) virtio_dma_buf(E) virtio_balloon(E) vfat(E) fat(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) libata(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> [ 2224.782487] Unloaded tainted modules: amd_atl(E):2 edac_mce_amd(E):1 padlock_aes(E):3
> [ 2224.787698] ---[ end trace 0000000000000000 ]---
> [ 2224.788286] RIP: 0010:refill_obj_stock+0x40/0x170
> [ 2224.788860] Code: 5c fa 65 48 8b 05 c8 c4 bd 77 4c 8d b8 60 12 03 00 49 8b 47 10 48 39 f8 74 5d 4c 89 ff e8 78 ed ff ff 49 89 c6 e8 f0 34 d7 ff <48> 8b 45 00 a8 03 0f 85 ca 00 00 00 65 48 ff 00 e8 ab 74 d7 ff 49
> [ 2224.790600] RSP: 0018:ffffa5024010ce10 EFLAGS: 00010002
> [ 2224.791230] RAX: 0000000000000002 RBX: 00000000000000c8 RCX: 00002d82d4038240
> [ 2224.791924] RDX: ffff977b00aa9a00 RSI: 0000000000000001 RDI: ffff977b00aa9a00
> [ 2224.792610] RBP: a91ef76620614d85 R08: 0000000000000001 R09: ffffffff881b9077
> [ 2224.793303] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000282
> [ 2224.793985] R13: ffff977b00235c00 R14: ffff977baa14e280 R15: ffff977f6bd31260
> [ 2224.794681] FS:  0000000000000000(0000) GS:ffff977f6bd00000(0000) knlGS:0000000000000000
> [ 2224.795439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2224.796117] CR2: 00007f2d7e5dc000 CR3: 0000000222340005 CR4: 0000000000770ef0
> [ 2224.796887] PKRU: 55555554
> [ 2224.797384] Kernel panic - not syncing: Fatal exception in interrupt
> [ 2224.798304] Kernel Offset: 0x7000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 2224.799190] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> 





[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux