Re: One-off rcu_nocb_rdp_deoffload bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 20, 2024 at 10:46:43AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 20, 2024 at 04:43:39PM +0200, Frederic Weisbecker wrote:
> > Sorry for the html mail, I only have my phone ...
> > 
> > Le mar. 20 août 2024, 13:07, Z qiang <qiang.zhang1211@xxxxxxxxx> a écrit :
> > 
> > > >
> > > > Hello, Frederic,
> > > >
> > > > I have seen this once on Neeraj's tree with a few commits on top (-rcu
> > > > commit 46774278c74f ("rcutorture: Test start-poll primitives with
> > > > interrupts disabled").  But only the once so far.
> > > >
> > > > This is the WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist)) in
> > > > rcu_nocb_rdp_deoffload().
> > > >
> > > > Thoughts?
> > > >
> > >
> > > The rcu_segcblist_extract_done_cbs() doesn't reduce the count of
> > > rsclp->len,
> > > in rcu_do_bacth().  this may cause that after we execute the barrier rcu
> > > callback, before decrementing the count of rsclp->len,  the rcu_barrier()
> > > returns and makes a judgment of rcu_segcblist_n_cbs(&rdp->cblist)  in
> > > rcu_nocb_rdp_deoffload().
> > >
> > 
> > That sounds plausible! You just unlocked my thoughts running in circle
> > since yesterday.
> > 
> > 
> > > maybe can use WARN_ON_ONCE(rcu_segcblist_n_segment_cbs()) instead
> > > of WARN_ON_ONCE(rcu_segcblist_n_cbs())
> > >
> > > Thoughts?
> > 
> > I'll test that once I'm back from vacation the september 2nd. Thanks!
> 
> Thank you both!!!
> 
> Running -next over last night hit a number of boot-time splats, so I
> have no idea if this reproduces nicely.  Can't have everything!  ;-)

And it is now a two-off given another one last night's testing.  This was
from 168 hours of TREE01 on my -rcu "dev" branch (as opposed to -next),
but I have run many runs over the past two weeks.  So it is reproducible,
but rare.

Ah, and if it matters, I synched up to Neeraj's latest as of about 18
hours ago just before starting this test.


							Thanx, Paul

------------------------------------------------------------------------

[13375.559536] ------------[ cut here ]------------
[13375.560748] WARNING: CPU: 27 PID: 103 at kernel/rcu/tree_nocb.h:1061 rcu_nocb_rdp_deoffload+0x292/0x2a0
[13375.563088] Modules linked in:
[13375.563861] CPU: 27 UID: 0 PID: 103 Comm: rcu_nocb_toggle Not tainted 6.11.0-rc1-00141-gc5fc1889f28b #1923
[13375.566261] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[13375.569047] RIP: 0010:rcu_nocb_rdp_deoffload+0x292/0x2a0
[13375.570373] Code: e9 3c ff ff ff 4c 89 e6 48 89 ef e8 88 fd e0 00 e9 3b fe ff ff 90 0f 0b 90 48 8b 83 c0 00 00 00 48 85 c0 0f 84 0b fe ff ff 90 <0f> 0b 90 e9 02 fe ff ff e8 d1 4f e0 00 90 90 90 90 90 90 90 90 90
[13375.574999] RSP: 0018:ffffafa9c04b3e30 EFLAGS: 00010002
[13375.576302] RAX: 000000000000007c RBX: ffff8f4a1eb6f480 RCX: 000000000000002b
[13375.578069] RDX: 0000000000000001 RSI: 000000000000002b RDI: ffff8f4a1eb6f5f0
[13375.579837] RBP: ffff8f4a1eb6f5f0 R08: 000000000000002a R09: 0000000000000001
[13375.581608] R10: ffffffff8d99b408 R11: 00000000001d1c76 R12: 0000000000000246
[13375.583376] R13: 0000000000000000 R14: ffff8f4a1ea2f480 R15: 0000000000000001
[13375.585149] FS:  0000000000000000(0000) GS:ffff8f4a1f0c0000(0000) knlGS:0000000000000000
[13375.587158] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13375.588598] CR2: 0000000000000000 CR3: 0000000002e0c000 CR4: 00000000000006f0
[13375.590360] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13375.592163] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13375.593943] Call Trace:
[13375.594572]  <TASK>
[13375.595102]  ? __warn+0x7e/0x120
[13375.595917]  ? rcu_nocb_rdp_deoffload+0x292/0x2a0
[13375.597091]  ? report_bug+0x18e/0x1a0
[13375.598013]  ? handle_bug+0x3d/0x70
[13375.598892]  ? exc_invalid_op+0x18/0x70
[13375.599935]  ? asm_exc_invalid_op+0x1a/0x20
[13375.600987]  ? rcu_nocb_rdp_deoffload+0x292/0x2a0
[13375.602163]  rcu_nocb_cpu_deoffload+0x70/0xa0
[13375.603262]  rcu_nocb_toggle+0x136/0x1c0
[13375.604250]  ? __pfx_rcu_nocb_toggle+0x10/0x10
[13375.605361]  kthread+0xd1/0x100
[13375.606159]  ? __pfx_kthread+0x10/0x10
[13375.607102]  ret_from_fork+0x2f/0x50
[13375.608001]  ? __pfx_kthread+0x10/0x10
[13375.608941]  ret_from_fork_asm+0x1a/0x30
[13375.609928]  </TASK>
[13375.610486] ---[ end trace 0000000000000000 ]---




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux