On Wed, Sep 22, 2021 at 01:05:12PM +0200, Peter Zijlstra wrote: > Use rcu_user_{enter,exit}() calls to provide SMP ordering on context > tracking state stores: > > __context_tracking_exit() > __this_cpu_write(context_tracking.state, CONTEXT_KERNEL) > rcu_user_exit() > rcu_eqs_exit() > rcu_dynticks_eqs_eit() > rcu_dynticks_inc() > atomic_add_return() /* smp_mb */ > > __context_tracking_enter() > rcu_user_enter() > rcu_eqs_enter() > rcu_dynticks_eqs_enter() > rcu_dynticks_inc() > atomic_add_return() /* smp_mb */ > __this_cpu_write(context_tracking.state, state) > > This separates USER/KERNEL state with an smp_mb() on each side, > therefore, a user of context_tracking_state_cpu() can say the CPU must > pass through an smp_mb() before changing. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> For the transformation to negative errno return value and name change from an RCU perspective: Acked-by: Paul E. McKenney <paulmck@xxxxxxxxxx> For the sampling of nohz_full userspace state: Another approach is for the rcu_data structure's ->dynticks variable to use the lower two bits to differentiate between idle, nohz_full userspace and kernel. In theory, inlining should make this zero cost for idle transition, and should allow you to safely sample nohz_full userspace state with a load and a couple of memory barriers instead of an IPI. To make this work nicely, the low-order bits have to be 00 for kernel, and (say) 01 for idle and 10 for nohz_full userspace. 11 would be an error. The trick would be for rcu_user_enter() and rcu_user_exit() to atomically increment ->dynticks by 2, for rcu_nmi_exit() to increment by 1 and rcu_nmi_enter() to increment by 3. The state sampling would need to change accordingly. Does this make sense, or am I missing something? Thanx, Paul > --- > include/linux/context_tracking_state.h | 12 ++++++++++++ > kernel/context_tracking.c | 7 ++++--- > 2 files changed, 16 insertions(+), 3 deletions(-) > > --- a/include/linux/context_tracking_state.h > +++ b/include/linux/context_tracking_state.h > @@ -45,11 +45,23 @@ static __always_inline bool context_trac > { > return __this_cpu_read(context_tracking.state) == CONTEXT_USER; > } > + > +static __always_inline bool context_tracking_state_cpu(int cpu) > +{ > + struct context_tracking *ct = per_cpu_ptr(&context_tracking); > + > + if (!context_tracking_enabled() || !ct->active) > + return CONTEXT_DISABLED; > + > + return ct->state; > +} > + > #else > static inline bool context_tracking_in_user(void) { return false; } > static inline bool context_tracking_enabled(void) { return false; } > static inline bool context_tracking_enabled_cpu(int cpu) { return false; } > static inline bool context_tracking_enabled_this_cpu(void) { return false; } > +static inline bool context_tracking_state_cpu(int cpu) { return CONTEXT_DISABLED; } > #endif /* CONFIG_CONTEXT_TRACKING */ > > #endif > --- a/kernel/context_tracking.c > +++ b/kernel/context_tracking.c > @@ -82,7 +82,7 @@ void noinstr __context_tracking_enter(en > vtime_user_enter(current); > instrumentation_end(); > } > - rcu_user_enter(); > + rcu_user_enter(); /* smp_mb */ > } > /* > * Even if context tracking is disabled on this CPU, because it's outside > @@ -149,12 +149,14 @@ void noinstr __context_tracking_exit(enu > return; > > if (__this_cpu_read(context_tracking.state) == state) { > + __this_cpu_write(context_tracking.state, CONTEXT_KERNEL); > + > if (__this_cpu_read(context_tracking.active)) { > /* > * We are going to run code that may use RCU. Inform > * RCU core about that (ie: we may need the tick again). > */ > - rcu_user_exit(); > + rcu_user_exit(); /* smp_mb */ > if (state == CONTEXT_USER) { > instrumentation_begin(); > vtime_user_exit(current); > @@ -162,7 +164,6 @@ void noinstr __context_tracking_exit(enu > instrumentation_end(); > } > } > - __this_cpu_write(context_tracking.state, CONTEXT_KERNEL); > } > context_tracking_recursion_exit(); > } > >