Re: [PATCH v6 1/2] sched/uclamp: Add a new sysctl to control RT default boost value

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Mon, 13 Jul 2020 15:35:58 +0200

On Mon, Jul 13, 2020 at 01:12:46PM +0100, Qais Yousef wrote:
> On 07/13/20 13:21, Peter Zijlstra wrote:

> > It's monday, and I cannot get my brain working.. I cannot decipher the
> > comments you have with the smp_[rw]mb(), what actual ordering do they
> > enforce?
> 
> It was a  bit of a paranoia to ensure that readers on other cpus see the new
> value after this point.

IIUC that's not something any barrier can provide.

Barriers can only order between (at least) two memory operations:

	X = 1;		y = Y;
	smp_wmb();	smp_rmb();
	Y = 1;		x = X;

guarantees that if y == 1, then x must also be 1. Because the left hand
side orders the store of Y after the store of X, while the right hand
side order the load of X after the load of Y. Therefore, if the first
load observes the last store, the second load must observe the first
store.

Without a second variable, barriers can't guarantee _anything_. Which is
why any barrier comment should refer to at least two variables.

> > Also, your synchronize_rcu() relies on write_lock() beeing
> > non-preemptible, which isn't true on PREEMPT_RT.
> > 
> > The below seems simpler...

> Hmm maybe I am missing something obvious, but beside the race with fork; I was
> worried about another race and that's what the synchronize_rcu() is trying to
> handle.
> 
> It's the classic preemption in the middle of RMW operation race.
> 
> 		copy_process()			sysctl_uclamp
> 
> 		  sched_post_fork()
> 		    __uclamp_sync_rt()
> 		      // read sysctl
> 		      // PREEMPT
> 						  for_each_process_thread()
> 		      // RESUME
> 		      // write syctl to p
> 

> 	2. sysctl_uclamp happens *during* sched_post_fork()
> 
> There's the risk of the classic preemption in the middle of RMW where another
> CPU could have changed the shared variable after the current CPU has already
> read it, but before writing it back.

Aah.. I see.

> I protect this with rcu_read_lock() which as far as I know synchronize_rcu()
> will ensure if we do the update during this section; we'll wait for it to
> finish. New forkees entering the rcu_read_lock() section will be okay because
> they should see the new value.
> 
> spinlocks() and mutexes seemed inferior to this approach.

Well, didn't we just write in another patch that p->uclamp_* was
protected by both rq->lock and p->pi_lock?