Re: [PATCH v2] Pre-emption control for userspace

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Tue, 25 Mar 2014 13:31:04 -0700

Khalid Aziz <khalid.aziz@xxxxxxxxxx> writes:

> On 03/25/2014 12:59 PM, ebiederm@xxxxxxxxxxxx wrote:
>> Khalid Aziz <khalid.aziz@xxxxxxxxxx> writes:
>>
>>> This patch adds a way for a thread to request additional timeslice from
>>> the scheduler if it is about to be preempted, so it could complete any
>>> critical task it is in the middle of. .......
>>
>>
>> Let me see if I understand the problem.  Your simulated application has
>> a ridiculous number of threads (1000) all contending for a single lock
>> with fairly long lock hold times between 600 and 20000 clocks assuming
>> no cache line misses.  So 1000 threads contending for about 10usec or
>> 1/100 of a tick when HZ=1000.  Giving  you something like 1 chance in
>> 100 of being preempted while holding the lock.  With 1000 threads
>> those sound like pretty bad odds.
>
> This problem does not happen because threads are holding the lock for too long,
> rather it happens when a thread does a large number of things in its loop and
> one small part of it requires it to hold a lock. So it holds the lock for a very
> short time but what can happen is thread is executing in non-critical section of
> its loop, it finally gets to the critical section just as its timeslice is about
> to end, it grabs the lock and is pre-empted right away. Now we start building a
> convoy of threads that want the same lock. This problem can be avoided if the
> locking thread could be given additional time to complete its critical section,
> release the lock and yield the processor if it indeed was granted amnesty by the
> scheduler.

I would dearly like to see the math that shows such a change will make
actually significantly change the propbabilities of this hitting.  It
seems more likely that this will just be a way for threads to cheat and
get larger time slices.

>> Maybe if this was limited to a cooperating set of userspace
>> tasks/threads this might not be too bad.  As this exists I have users
>> who would hunt me down with malicious intent if this code ever showed up
>> on our servers, because it would make life for every other application
>> on the server worse.
>>
>
> Yes, it is indeed limited to a cooperating set of userspace
> tasks/threads. Tasks/threads will explicitly choose to use this feature. It is a
> no-op for every one else.

It is absolutely not a no-op for me if my task can't be scheduled soon
enough because your task executed sched-prempt.

It means my latency goes up and the *random bad thing* will happen
because I missed my deadline because I was not scheduled fast enough.

>> or (b) limiting this to just a small
>> cooperating set of threads in a single cgroup.
>
> and that is almost what this patch does. It is not limited to a cgroup, rather
> to the tasks/threads that ask to use this feature.

Except you do not appear to be considering what could be scheduled in
your tasks place.

You allow any task to extend it's timeslice.

Which means I will get the question why does why does
really_important_job only miss it's latency guarantees when running on
the same box as sched_preempt_using_job?

Your change appears to have extremely difficult to debug non-local
effects.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html