On Tue, Aug 20, 2024 at 10:46:43AM -0700, Paul E. McKenney wrote: > On Tue, Aug 20, 2024 at 04:43:39PM +0200, Frederic Weisbecker wrote: > > Sorry for the html mail, I only have my phone ... > > > > Le mar. 20 août 2024, 13:07, Z qiang <qiang.zhang1211@xxxxxxxxx> a écrit : > > > > > > > > > > Hello, Frederic, > > > > > > > > I have seen this once on Neeraj's tree with a few commits on top (-rcu > > > > commit 46774278c74f ("rcutorture: Test start-poll primitives with > > > > interrupts disabled"). But only the once so far. > > > > > > > > This is the WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist)) in > > > > rcu_nocb_rdp_deoffload(). > > > > > > > > Thoughts? > > > > > > > > > > The rcu_segcblist_extract_done_cbs() doesn't reduce the count of > > > rsclp->len, > > > in rcu_do_bacth(). this may cause that after we execute the barrier rcu > > > callback, before decrementing the count of rsclp->len, the rcu_barrier() > > > returns and makes a judgment of rcu_segcblist_n_cbs(&rdp->cblist) in > > > rcu_nocb_rdp_deoffload(). > > > > > > > That sounds plausible! You just unlocked my thoughts running in circle > > since yesterday. > > > > > > > maybe can use WARN_ON_ONCE(rcu_segcblist_n_segment_cbs()) instead > > > of WARN_ON_ONCE(rcu_segcblist_n_cbs()) > > > > > > Thoughts? > > > > I'll test that once I'm back from vacation the september 2nd. Thanks! > > Thank you both!!! > > Running -next over last night hit a number of boot-time splats, so I > have no idea if this reproduces nicely. Can't have everything! ;-) And it is now a two-off given another one last night's testing. This was from 168 hours of TREE01 on my -rcu "dev" branch (as opposed to -next), but I have run many runs over the past two weeks. So it is reproducible, but rare. Ah, and if it matters, I synched up to Neeraj's latest as of about 18 hours ago just before starting this test. Thanx, Paul ------------------------------------------------------------------------ [13375.559536] ------------[ cut here ]------------ [13375.560748] WARNING: CPU: 27 PID: 103 at kernel/rcu/tree_nocb.h:1061 rcu_nocb_rdp_deoffload+0x292/0x2a0 [13375.563088] Modules linked in: [13375.563861] CPU: 27 UID: 0 PID: 103 Comm: rcu_nocb_toggle Not tainted 6.11.0-rc1-00141-gc5fc1889f28b #1923 [13375.566261] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [13375.569047] RIP: 0010:rcu_nocb_rdp_deoffload+0x292/0x2a0 [13375.570373] Code: e9 3c ff ff ff 4c 89 e6 48 89 ef e8 88 fd e0 00 e9 3b fe ff ff 90 0f 0b 90 48 8b 83 c0 00 00 00 48 85 c0 0f 84 0b fe ff ff 90 <0f> 0b 90 e9 02 fe ff ff e8 d1 4f e0 00 90 90 90 90 90 90 90 90 90 [13375.574999] RSP: 0018:ffffafa9c04b3e30 EFLAGS: 00010002 [13375.576302] RAX: 000000000000007c RBX: ffff8f4a1eb6f480 RCX: 000000000000002b [13375.578069] RDX: 0000000000000001 RSI: 000000000000002b RDI: ffff8f4a1eb6f5f0 [13375.579837] RBP: ffff8f4a1eb6f5f0 R08: 000000000000002a R09: 0000000000000001 [13375.581608] R10: ffffffff8d99b408 R11: 00000000001d1c76 R12: 0000000000000246 [13375.583376] R13: 0000000000000000 R14: ffff8f4a1ea2f480 R15: 0000000000000001 [13375.585149] FS: 0000000000000000(0000) GS:ffff8f4a1f0c0000(0000) knlGS:0000000000000000 [13375.587158] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [13375.588598] CR2: 0000000000000000 CR3: 0000000002e0c000 CR4: 00000000000006f0 [13375.590360] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [13375.592163] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [13375.593943] Call Trace: [13375.594572] <TASK> [13375.595102] ? __warn+0x7e/0x120 [13375.595917] ? rcu_nocb_rdp_deoffload+0x292/0x2a0 [13375.597091] ? report_bug+0x18e/0x1a0 [13375.598013] ? handle_bug+0x3d/0x70 [13375.598892] ? exc_invalid_op+0x18/0x70 [13375.599935] ? asm_exc_invalid_op+0x1a/0x20 [13375.600987] ? rcu_nocb_rdp_deoffload+0x292/0x2a0 [13375.602163] rcu_nocb_cpu_deoffload+0x70/0xa0 [13375.603262] rcu_nocb_toggle+0x136/0x1c0 [13375.604250] ? __pfx_rcu_nocb_toggle+0x10/0x10 [13375.605361] kthread+0xd1/0x100 [13375.606159] ? __pfx_kthread+0x10/0x10 [13375.607102] ret_from_fork+0x2f/0x50 [13375.608001] ? __pfx_kthread+0x10/0x10 [13375.608941] ret_from_fork_asm+0x1a/0x30 [13375.609928] </TASK> [13375.610486] ---[ end trace 0000000000000000 ]---