On Tue, Jun 11, 2024 at 07:16:37PM +0530, Neeraj upadhyay wrote: > On Mon, Jun 10, 2024 at 8:42 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > On Fri, Jun 07, 2024 at 07:21:55AM +0530, Neeraj upadhyay wrote: > > > On Thu, Jun 6, 2024 at 11:42 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > > > On Thu, Jun 06, 2024 at 11:28:07AM +0530, Neeraj upadhyay wrote: > > > > > On Wed, Jun 5, 2024 at 10:05 PM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote: > > > > > > > > > > > > Le Tue, Jun 04, 2024 at 03:23:49PM -0700, Paul E. McKenney a écrit : > > > > > > > From: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx> > > > > > > > > > > > > > > In the synchronize_rcu() common case, we will have less than > > > > > > > SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker > > > > > > > is pointless just to free the last injected wait head since at that point, > > > > > > > all the users have already been awakened. > > > > > > > > > > > > > > Introduce a new counter to track this and prevent the wakeup in the > > > > > > > common case. > > > > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> > > > > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> > > > > > > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > > > > > > --- > > > > > > > kernel/rcu/tree.c | 35 ++++++++++++++++++++++++++++++----- > > > > > > > kernel/rcu/tree.h | 1 + > > > > > > > 2 files changed, 31 insertions(+), 5 deletions(-) > > > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > > index 6ba36d9c09bde..2fe08e6186b4d 100644 > > > > > > > --- a/kernel/rcu/tree.c > > > > > > > +++ b/kernel/rcu/tree.c > > > > > > > @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = { > > > > > > > .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED, > > > > > > > .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work, > > > > > > > rcu_sr_normal_gp_cleanup_work), > > > > > > > + .srs_cleanups_pending = ATOMIC_INIT(0), > > > > > > > }; > > > > > > > > > > > > > > /* Dump rcu_node combining tree at boot to verify correct setup. */ > > > > > > > @@ -1633,8 +1634,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) > > > > > > > * the done tail list manipulations are protected here. > > > > > > > */ > > > > > > > done = smp_load_acquire(&rcu_state.srs_done_tail); > > > > > > > - if (!done) > > > > > > > + if (!done) { > > > > > > > + /* See comments below. */ > > > > > > > + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); > > > > > > > > > > > > This condition is not supposed to happen. If the work is scheduled, > > > > > > there has to be a wait_queue in rcu_state.srs_done_tail. And decrementing > > > > > > may make things worse. > > > > > > > > > > > > > > > > I also don't see a scenario where this can happen. However, if we are > > > > > returning from here, given that for every queued work we do an > > > > > increment of rcu_state.srs_cleanups_pending, I think it's safer to > > > > > decrement in this > > > > > case, as that counter tracks only the work queuing and execution counts. > > > > > > > > > > atomic_inc(&rcu_state.srs_cleanups_pending); > > > > > if (!queue_work(sync_wq, &rcu_state.srs_cleanup_work)) > > > > > atomic_dec(&rcu_state.srs_cleanups_pending); > > > > > > > > Linus Torvald's general rule is that if you cannot imagine how a bug > > > > can happen, don't attempt to clean up after it. His rationale (which > > > > is *almost* always a good one) is that not knowing how the bug happens > > > > means that attempts to clean up will usually just make matters worse. > > > > And all too often, the clean-up code makes debugging more difficult. > > > > > > > > > > Ok. Thanks for sharing this info! > > > > > > > One example exception to this rule is when debug-objects detects a > > > > duplicate call_rcu(). In that case, we ignore that second call_rcu(). > > > > But the reason is that experience has shown that the usual cause really > > > > is someone doing a duplicate call_rcu(), and also that ignoring the > > > > second call_rcu() makes debugging easier. > > > > > > > > So what is it that Frederic and I are missing here? > > > > > > Maybe nothing. As kworker context does not modify srs_done_tail and > > > invalid values > > > of srs_done_tail can only be caused by the GP kthread manipulations > > > of srs_done_tail , my thought here was, we can keep the pending > > > rcu_sr_normal_gp_cleanup_work count consistent with the number of > > > queue_work() and kworker executions, even when we see unexpected > > > srs_done_tail values like these. However, as you described the general rule > > > is to not attempt any clean up for such scenarios. > > > > So "if (WARN_ON_ONCE(!done) return;"? > > > > Looks good, nit: one more closing parenthesis "if (WARN_ON_ONCE(!done)) return;" > > > Or is there a better way? > > > > I think, above check suffices . Very well, I will make this change on the next rebase. Thanx, Paul