On Thu, Jun 27, 2024 at 3:33 AM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote: > > Le Wed, Jun 26, 2024 at 10:49:05PM +0530, Neeraj upadhyay a écrit : > > On Wed, Jun 26, 2024 at 7:43 PM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote: > > > > > > Le Wed, Jun 12, 2024 at 01:57:20PM +0530, Neeraj upadhyay a écrit : > > > > On Wed, Jun 5, 2024 at 3:58 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > > > > > From: Frederic Weisbecker <frederic@xxxxxxxxxx> > > > > > > > > > > When the grace period kthread checks the extended quiescent state > > > > > counter of a CPU, full ordering is necessary to ensure that either: > > > > > > > > > > * If the GP kthread observes the remote target in an extended quiescent > > > > > state, then that target must observe all accesses prior to the current > > > > > grace period, including the current grace period sequence number, once > > > > > it exits that extended quiescent state. > > > > > > > > > > or: > > > > > > > > > > * If the GP kthread observes the remote target NOT in an extended > > > > > quiescent state, then the target further entering in an extended > > > > > quiescent state must observe all accesses prior to the current > > > > > grace period, including the current grace period sequence number, once > > > > > it enters that extended quiescent state. > > > > > > > > > > This ordering is enforced through a full memory barrier placed right > > > > > before taking the first EQS snapshot. However this is superfluous > > > > > because the snapshot is taken while holding the target's rnp lock which > > > > > provides the necessary ordering through its chain of > > > > > smp_mb__after_unlock_lock(). > > > > > > > > > > Remove the needless explicit barrier before the snapshot and put a > > > > > comment about the implicit barrier newly relied upon here. > > > > > > > > > > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx> > > > > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > > > > --- > > > > > .../Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst | 6 +++--- > > > > > kernel/rcu/tree.c | 7 ++++++- > > > > > 2 files changed, 9 insertions(+), 4 deletions(-) > > > > > > > > > > diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst > > > > > index 5750f125361b0..728b1e690c646 100644 > > > > > --- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst > > > > > +++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst > > > > > @@ -149,9 +149,9 @@ This case is handled by calls to the strongly ordered > > > > > ``atomic_add_return()`` read-modify-write atomic operation that > > > > > is invoked within ``rcu_dynticks_eqs_enter()`` at idle-entry > > > > > time and within ``rcu_dynticks_eqs_exit()`` at idle-exit time. > > > > > -The grace-period kthread invokes ``rcu_dynticks_snap()`` and > > > > > -``rcu_dynticks_in_eqs_since()`` (both of which invoke > > > > > -an ``atomic_add_return()`` of zero) to detect idle CPUs. > > > > > +The grace-period kthread invokes first ``ct_dynticks_cpu_acquire()`` > > > > > +(preceded by a full memory barrier) and ``rcu_dynticks_in_eqs_since()`` > > > > > +(both of which rely on acquire semantics) to detect idle CPUs. > > > > > > > > > > +-----------------------------------------------------------------------+ > > > > > | **Quick Quiz**: | > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > index f07b8bff4621b..1a6ef9c5c949e 100644 > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -769,7 +769,12 @@ static void rcu_gpnum_ovf(struct rcu_node *rnp, struct rcu_data *rdp) > > > > > */ > > > > > static int dyntick_save_progress_counter(struct rcu_data *rdp) > > > > > { > > > > > - rdp->dynticks_snap = rcu_dynticks_snap(rdp->cpu); > > > > > + /* > > > > > + * Full ordering against accesses prior current GP and also against > > > > > + * current GP sequence number is enforced by current rnp locking > > > > > + * with chained smp_mb__after_unlock_lock(). > > > > > + */ > > > > > > > > It might be worth mentioning that this chained smp_mb__after_unlock_lock() > > > > is provided by rnp leaf node locking in rcu_gp_init() and rcu_gp_fqs_loop() ? > > > > > > Right! > > > > > > How about this? > > > > > > > Looks good to me, thanks! Minor comment (ditto for the other patch) below > > > > > > > /* > > > * Full ordering against accesses prior current GP and also against > > > > Nit: "prior to current GP" ? > > Thanks. On a second thought and just to make sure we don't forget why we did > what we did after a few years, I expanded some more, still ok with the following? > Yes, looks good! Thanks Neeraj > /* > * Full ordering between remote CPU's post idle accesses and updater's > * accesses prior to current GP (and also the started GP sequence number) > * is enforced by rcu_seq_start() implicit barrier and even further by > * smp_mb__after_unlock_lock() barriers chained all the way throughout the > * rnp locking tree since rcu_gp_init() and up to the current leaf rnp > * locking. > * > * Ordering between remote CPU's pre idle accesses and post grace period's > * accesses is enforced by the below acquire semantic. > */ > > Thanks.