> On Mar 19, 2024, at 5:53 AM, Uladzislau Rezki <urezki@xxxxxxxxx> wrote: > > On Mon, Mar 18, 2024 at 05:05:31PM -0400, Joel Fernandes wrote: >> >> >>>> On Mar 18, 2024, at 2:58 PM, Uladzislau Rezki <urezki@xxxxxxxxx> wrote: >>> >>> Hello, Joel! >>> >>> Sorry for late checking, see below few comments: >>> >>>> In the synchronize_rcu() common case, we will have less than >>>> SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker >>>> is pointless just to free the last injected wait head since at that point, >>>> all the users have already been awakened. >>>> >>>> Introduce a new counter to track this and prevent the wakeup in the >>>> common case. >>>> >>>> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> >>>> --- >>>> Rebased on paul/dev of today. >>>> >>>> kernel/rcu/tree.c | 36 +++++++++++++++++++++++++++++++----- >>>> kernel/rcu/tree.h | 1 + >>>> 2 files changed, 32 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >>>> index 9fbb5ab57c84..bd29fe3c76bf 100644 >>>> --- a/kernel/rcu/tree.c >>>> +++ b/kernel/rcu/tree.c >>>> @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = { >>>> .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED, >>>> .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work, >>>> rcu_sr_normal_gp_cleanup_work), >>>> + .srs_cleanups_pending = ATOMIC_INIT(0), >>>> }; >>>> >>>> /* Dump rcu_node combining tree at boot to verify correct setup. */ >>>> @@ -1642,8 +1643,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) >>>> * the done tail list manipulations are protected here. >>>> */ >>>> done = smp_load_acquire(&rcu_state.srs_done_tail); >>>> - if (!done) >>>> + if (!done) { >>>> + /* See comments below. */ >>>> + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); >>>> return; >>>> + } >>>> >>>> WARN_ON_ONCE(!rcu_sr_is_wait_head(done)); >>>> head = done->next; >>>> @@ -1666,6 +1670,9 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) >>>> >>>> rcu_sr_put_wait_head(rcu); >>>> } >>>> + >>>> + /* Order list manipulations with atomic access. */ >>>> + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); >>>> } >>>> >>>> /* >>>> @@ -1673,7 +1680,7 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) >>>> */ >>>> static void rcu_sr_normal_gp_cleanup(void) >>>> { >>>> - struct llist_node *wait_tail, *next, *rcu; >>>> + struct llist_node *wait_tail, *next = NULL, *rcu = NULL; >>>> int done = 0; >>>> >>>> wait_tail = rcu_state.srs_wait_tail; >>>> @@ -1699,16 +1706,35 @@ static void rcu_sr_normal_gp_cleanup(void) >>>> break; >>>> } >>>> >>>> - // concurrent sr_normal_gp_cleanup work might observe this update. >>>> - smp_store_release(&rcu_state.srs_done_tail, wait_tail); >>>> + /* >>>> + * Fast path, no more users to process. Remove the last wait head >>>> + * if no inflight-workers. If there are in-flight workers, let them >>>> + * remove the last wait head. >>>> + */ >>>> + WARN_ON_ONCE(!rcu); >>>> >>> This assumption is not correct. An "rcu" can be NULL in fact. >> >> Hmm I could never trigger that. Are you saying that is true after Neeraj recent patch or something else? >> Note, after Neeraj patch to handle the lack of heads availability, it could be true so I requested >> him to rebase his patch on top of this one. >> >> However I will revisit my patch and look for if it could occur but please let me know if you knew of a sequence of events to make it NULL. >>> > I think we should agree on your patch first otherwise it becomes a bit > messy or go with a Neeraj as first step and then work on youth. So, i > reviewed this patch based on latest Paul's dev branch. I see that Neeraj > needs further work. You are right. So the only change is to drop the warning and those braces. Agreed? I will resend the patch and we can discuss during tomorrow call as well. Thanks! Joel > > So this is true without Neeraj patch. Consider the following case: > > 3 2 1 0 > wh -> cb -> cb -> cb -> NULL > > we start to process from 2 and handle all clients, in the end, > an "rcu" points to NULL and trigger the WARN_ON_ONCE. I see the > splat during the boot: > > <snip> > [ 0.927699][ T16] ------------[ cut here ]------------ > [ 0.930867][ T16] WARNING: CPU: 0 PID: 16 at kernel/rcu/tree.c:1721 rcu_gp_cleanup+0x37b/0x4a0 > [ 0.930490][ T1] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 > [ 0.931401][ T16] Modules linked in: > [ 0.932400][ T1] PCI: Using configuration type 1 for base access > [ 0.932771][ T16] > [ 0.932773][ T16] CPU: 0 PID: 16 Comm: rcu_sched Not tainted 6.8.0-rc2-00089-g65ae0a6b86f0-dirty #1156 > [ 0.937780][ T16] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > [ 0.939402][ T16] RIP: 0010:rcu_gp_cleanup+0x37b/0x4a0 > [ 0.940636][ T16] Code: b0 4b bd 72 09 48 81 ff e8 b0 4b bd 76 1e 4c 8b 27 48 83 c7 10 e8 a5 8e fb ff 4c 89 23 83 ed 01 74 0a 4c 89 e7 48 85 ff 75 d2 <0f> 0b 48 8b 35 14 d0 fd 02 48 89 1d 8d 64 d0 01 48 83 c4 08 48 c7 > [ 0.942402][ T16] RSP: 0018:ffff9b4a8008fe88 EFLAGS: 00010246 > [ 0.943648][ T16] RAX: 0000000000000000 RBX: ffffffffbd4bb0a8 RCX: 6c9b26c9b26c9b27 > [ 0.944751][ T16] RDX: 0000000000000000 RSI: 00000000374b92b6 RDI: 0000000000000000 > [ 0.945757][ T16] RBP: 0000000000000004 R08: fffffffffff54ea1 R09: 0000000000000000 > [ 0.946753][ T16] R10: ffff89070098c278 R11: 0000000000000001 R12: 0000000000000000 > [ 0.947752][ T16] R13: fffffffffffffcbc R14: 0000000000000000 R15: ffffffffbd3f1300 > [ 0.948764][ T16] FS: 0000000000000000(0000) GS:ffff8915efe00000(0000) knlGS:0000000000000000 > [ 0.950403][ T16] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.951656][ T16] CR2: ffff89163ffff000 CR3: 00000002eae26000 CR4: 00000000000006f0 > [ 0.952755][ T16] Call Trace: > [ 0.953597][ T16] <TASK> > [ 0.955404][ T16] ? __warn+0x80/0x140 > [ 0.956608][ T16] ? rcu_gp_cleanup+0x37b/0x4a0 > [ 0.957621][ T16] ? report_bug+0x15d/0x180 > [ 0.959403][ T16] ? handle_bug+0x3c/0x70 > [ 0.960616][ T16] ? exc_invalid_op+0x17/0x70 > [ 0.961620][ T16] ? asm_exc_invalid_op+0x1a/0x20 > [ 0.962627][ T16] ? rcu_gp_cleanup+0x37b/0x4a0 > [ 0.963622][ T16] ? rcu_gp_cleanup+0x36b/0x4a0 > [ 0.965403][ T16] ? __pfx_rcu_gp_kthread+0x10/0x10 > [ 0.967402][ T16] rcu_gp_kthread+0xf7/0x180 > [ 0.968619][ T16] kthread+0xd3/0x100 > [ 0.969602][ T16] ? __pfx_kthread+0x10/0x10 > [ 0.971402][ T16] ret_from_fork+0x34/0x50 > [ 0.972613][ T16] ? __pfx_kthread+0x10/0x10 > [ 0.973615][ T16] ret_from_fork_asm+0x1b/0x30 > [ 0.974624][ T16] </TASK> > [ 0.975587][ T16] ---[ end trace 0000000000000000 ]--- > <snip> > > -- > Uladzislau Rezki