Re: [RFC PATCH 00/86] Make the kernel preemptible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Steven Rostedt <rostedt@xxxxxxxxxxx> writes:

> On Tue,  7 Nov 2023 13:56:46 -0800
> Ankur Arora <ankur.a.arora@xxxxxxxxxx> wrote:
>
>> Hi,
>
> Hi Ankur,
>
> Thanks for doing this!
>
>>
>> We have two models of preemption: voluntary and full (and RT which is
>> a fuller form of full preemption.) In this series -- which is based
>> on Thomas' PoC (see [1]), we try to unify the two by letting the
>> scheduler enforce policy for the voluntary preemption models as well.
>
> I would say there's "NONE" which is really just a "voluntary" but with
> fewer preemption points ;-) But still should be mentioned, otherwise people
> may get confused.
>
>>
>> (Note that this is about preemption when executing in the kernel.
>> Userspace is always preemptible.)
>>
>
>
>> Design
>> ==
>>
>> As Thomas outlines in [1], to unify the preemption models we
>> want to: always have the preempt_count enabled and allow the scheduler
>> to drive preemption policy based on the model in effect.
>>
>> Policies:
>>
>> - preemption=none: run to completion
>> - preemption=voluntary: run to completion, unless a task of higher
>>   sched-class awaits
>> - preemption=full: optimized for low-latency. Preempt whenever a higher
>>   priority task awaits.
>>
>> To do this add a new flag, TIF_NEED_RESCHED_LAZY which allows the
>> scheduler to mark that a reschedule is needed, but is deferred until
>> the task finishes executing in the kernel -- voluntary preemption
>> as it were.
>>
>> The TIF_NEED_RESCHED flag is evaluated at all three of the preemption
>> points. TIF_NEED_RESCHED_LAZY only needs to be evaluated at ret-to-user.
>>
>>          ret-to-user    ret-to-kernel    preempt_count()
>> none           Y              N                N
>> voluntary      Y              Y                Y
>> full           Y              Y                Y
>
> Wait. The above is for when RESCHED_LAZY is to preempt, right?
>
> Then, shouldn't voluntary be:
>
>  voluntary      Y              N                N
>
> For LAZY, but
>
>  voluntary      Y              Y                Y
>
> For NEED_RESCHED (without lazy)

Yes. You are, of course, right. I was talking about the TIF_NEED_RESCHED flags
and in the middle switched to talking about how the voluntary model will
get to what it wants.

> That is, the only difference between voluntary and none (as you describe
> above) is that when an RT task wakes up, on voluntary, it sets NEED_RESCHED,
> but on none, it still sets NEED_RESCHED_LAZY?

Yeah exactly. Just to restate without mucking it up:

The TIF_NEED_RESCHED flag is evaluated at all three of the preemption
points. TIF_NEED_RESCHED_LAZY only needs to be evaluated at ret-to-user.

                  ret-to-user    ret-to-kernel    preempt_count()
NEED_RESCHED_LAZY    Y              N                N
NEED_RESCHED         Y              Y                Y

Based on how various preemption models set the flag they would cause
preemption at:

                  ret-to-user    ret-to-kernel    preempt_count()
none                 Y              N                N
voluntary            Y              Y                Y
full                 Y              Y                Y

>>   The max-load numbers (not posted here) also behave similarly.
>
> It would be interesting to run any "latency sensitive" benchmarks.
>
> I wounder how cyclictest would work under each model with and without this
> patch?

Didn't post these numbers because I suspect that code isn't quite right,
but voluntary preemption for instance does what it promises:

# echo NO_FORCE_PREEMPT  > sched/features
# echo NO_PREEMPT_PRIORITY > sched/features    # preempt=none
# stress-ng --cyclic 1  --timeout 10
stress-ng: info:  [1214172] setting to a 10 second run per stressor
stress-ng: info:  [1214172] dispatching hogs: 1 cyclic
stress-ng: info:  [1214174] cyclic: sched SCHED_DEADLINE: 100000 ns delay, 10000 samples
stress-ng: info:  [1214174] cyclic:   mean: 9834.56 ns, mode: 3495 ns
stress-ng: info:  [1214174] cyclic:   min: 2413 ns, max: 3145065 ns, std.dev. 77096.98
stress-ng: info:  [1214174] cyclic: latency percentiles:
stress-ng: info:  [1214174] cyclic:   25.00%:       3366 ns
stress-ng: info:  [1214174] cyclic:   50.00%:       3505 ns
stress-ng: info:  [1214174] cyclic:   75.00%:       3776 ns
stress-ng: info:  [1214174] cyclic:   90.00%:       4316 ns
stress-ng: info:  [1214174] cyclic:   95.40%:      10989 ns
stress-ng: info:  [1214174] cyclic:   99.00%:      91181 ns
stress-ng: info:  [1214174] cyclic:   99.50%:     290477 ns
stress-ng: info:  [1214174] cyclic:   99.90%:    1360837 ns
stress-ng: info:  [1214174] cyclic:   99.99%:    3145065 ns
stress-ng: info:  [1214172] successful run completed in 10.00s

# echo PREEMPT_PRIORITY > features    # preempt=voluntary
# stress-ng --cyclic 1  --timeout 10
stress-ng: info:  [916483] setting to a 10 second run per stressor
stress-ng: info:  [916483] dispatching hogs: 1 cyclic
stress-ng: info:  [916484] cyclic: sched SCHED_DEADLINE: 100000 ns delay, 10000 samples
stress-ng: info:  [916484] cyclic:   mean: 3682.77 ns, mode: 3185 ns
stress-ng: info:  [916484] cyclic:   min: 2523 ns, max: 150082 ns, std.dev. 2198.07
stress-ng: info:  [916484] cyclic: latency percentiles:
stress-ng: info:  [916484] cyclic:   25.00%:       3185 ns
stress-ng: info:  [916484] cyclic:   50.00%:       3306 ns
stress-ng: info:  [916484] cyclic:   75.00%:       3666 ns
stress-ng: info:  [916484] cyclic:   90.00%:       4778 ns
stress-ng: info:  [916484] cyclic:   95.40%:       5359 ns
stress-ng: info:  [916484] cyclic:   99.00%:       6141 ns
stress-ng: info:  [916484] cyclic:   99.50%:       7824 ns
stress-ng: info:  [916484] cyclic:   99.90%:      29825 ns
stress-ng: info:  [916484] cyclic:   99.99%:     150082 ns
stress-ng: info:  [916483] successful run completed in 10.01s

This is with a background kernbench half-load.

Let me see if I can dig out the numbers without this series.

--
ankur




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux