Re: [PATCH 2/4 v0.5] sched/umcg: RFC: add userspace atomic helpers

Peter Oskolkov <posk@xxxxxxxxxx> · Thu, 9 Sep 2021 12:06:58 -0700

On Wed, Sep 8, 2021 at 4:39 PM Jann Horn <jannh@xxxxxxxxxx> wrote:

Thanks a lot for the reviews, Jann!

I understand how to address most of your comments. However, one issue
I'm not sure what to do about:

[...]

> If this function is not allowed to sleep, as the comment says...

[...]

> ... then I'm pretty sure you can't call fix_pagefault() here, which
> acquires the mmap semaphore (which may involve sleeping) and then goes
> through the pagefault handling path (which can also sleep for various
> reasons, like allocating memory for pagetables, loading pages from
> disk / NFS / FUSE, and so on).

<quote from peterz@ from
https://lore.kernel.org/lkml/20210609125435.GA68187@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/>:
  So a PF_UMCG_WORKER would be added to sched_submit_work()'s PF_*_WORKER
  path to capture these tasks blocking. The umcg_sleeping() hook added
  there would:

    put_user(BLOCKED, umcg_task->umcg_status);
    ...
</quote>

Which is basically what I am doing here: in sched_submit_work() I need
to read/write to userspace; and we cannot sleep in
sched_submit_work(), I believe.

If you are right that it is impossible to deal with pagefaults from
within non-sleepable contexts, I see two options:

Option 1: as you suggest, pin pages holding struct umcg_task in sys_umcg_ctl;

or

Option 2: add more umcg-related kernel state to task_struct so that
reading/writing to userspace is not necessary in sched_submit_work().

The first option sounds much better from the code simplicity point of
view, but I'm not sure if it is a viable approach, i.e. I'm afraid
we'll get a hard NACK here, as a non-privileged process will be able
to force the kernel to pin a page per task/thread. We may get around
it by first pinning a limited number of pages, then having the
userspace allocate structs umcg_task on those pages, so that a pinned
page would cover more than a single task/thread. And have a sysctl
that limits the number of pinned pages per MM.

Peter Z., could you, please, comment here? Do you think pinning pages
to hold structs umcg_task is acceptable?

[...]