Re: [RFC][PATCH 0/3] sched: User Managed Concurrency Groups

Peter Oskolkov <posk@xxxxxxx> · Wed, 15 Dec 2021 09:56:06 -0800

On Wed, Dec 15, 2021 at 2:06 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Dec 14, 2021 at 07:46:25PM -0800, Peter Oskolkov wrote:
> > On Tue, Dec 14, 2021 at 12:55 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > This is actually tested code; but still missing the SMP wake-to-idle machinery.
> > > I still need to think about that.
> >
> > Thanks, Peter!
> >
> > At a first glance, your main patch does not look much smaller than
> > mine, and I thought the whole point of re-doing it was to throw away
> > extra features and make things smaller/simpler...
>
> Well, simpler was the goal. I didn't really focus on size much. It isn't
> really big to begin with.
>
> But yes, it has 5 hooks now, 3 syscalls and lots of comments and all
> that under 900 lines, not bad I'd say.

My patchset had three hooks and two syscalls, and fewer new fields
added to task_struct... And similarly around 900 lines on the kernel
side in the main patch. So I am not sure why you believe that your
approach is simpler, unless there was something fundamentally wrong
with my approach. But tglx@ looked into it, and his remarks were more
about comments and the commit message and smaller things at a function
level, like an unneeded goto, than about the overall design...

>
> Also I think you wanted something like this? I'm not sure of the LAZY
> name, but I can't seem to come up with anything saner atm.
>
[...]
>  /*
> + * Enqueue tsk to it's server's runnable list and wake the server for pickup if
> + * so desired. Notable LAZY workers will not wake the server and rely on the
> + * server to do pickup whenever it naturally runs next.

No, I never suggested we needed per-server runnable queues: in all my
patchsets I had a single list of idle (runnable) workers.

[...]

>From another message:

>> Anyway, I'll test your patchset over the next week or so and let you
>> know if anything really needed is missing (other than waking an idle
>> server if there is one on a worker wakeup; this piece is definitely
> needed).

> Right, so the problem I'm having is that a single idle server ptr like
> before can trivially miss waking annother idle server.

I believe the approach I used in my patchset, suggested by Thierry
Delisle, works.

In short, there is a single idle server ptr for the kernel to work
with. The userspace maintains a list of idle servers. If the ptr is
NULL, the list is empty. When the kernel wakes the idle server it
sees, the server reaps the runnable worker list and wakes another idle
server from the userspace list, if available. This newly woken idle
server repoints the ptr to itself, checks the runnable worker list, to
avoid missing a woken worker, then goes to sleep.

Why do you think this approach is not OK?