Re: [PATCH v6 1/2] sched/uclamp: Add a new sysctl to control RT default boost value

Valentin Schneider <valentin.schneider@xxxxxxx> · Mon, 06 Jul 2020 16:49:19 +0100

On 06/07/20 15:28, Qais Yousef wrote:
> CC: linux-fsdevel@xxxxxxxxxxxxxxx
> ---
>
> Peter
>
> I didn't do the
>
>       read_lock(&taslist_lock);
>       smp_mb__after_spinlock();
>       read_unlock(&tasklist_lock);
>
> dance you suggested on IRC as it didn't seem necessary. But maybe I missed
> something.
>

So the annoying bit with just uclamp_fork() is that it happens *before* the
task is appended to the tasklist. This means without too much care we
would have (if we'd do a sync at uclamp_fork()):

  CPU0 (sysctl write)                                CPU1 (concurrent forker)

                                                       copy_process()
                                                         uclamp_fork()
                                                           p.uclamp_min = state
    state = foo

    for_each_process_thread(p, t)
      update_state(t);
                                                         list_add(p)

i.e. that newly forked process would entirely sidestep the update. Now,
with Peter's suggested approach we can be in a much better situation. If we
have this in the sysctl update:

  state = foo;

  read_lock(&taslist_lock);
  smp_mb__after_spinlock();
  read_unlock(&tasklist_lock);

  for_each_process_thread(p, t)
    update_state(t);

While having this in the fork:

  write_lock(&tasklist_lock);
  list_add(p);
  write_unlock(&tasklist_lock);

  sched_post_fork(p); // state re-read here; probably wants an mb first

Then we can no longer miss an update. If the forked p doesn't see the new
value, it *must* have been added to the tasklist before the updater loops
over it, so the loop will catch it. If it sees the new value, we're done.

AIUI, the above strategy doesn't require any use of RCU. The update_state()
and sched_post_fork() can race, but as per the above they should both be
writing the same value.