Re: [PATCH] ACPI: Remove msleep() bloat from acpi_os_sleep()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I try to give some input from the timer perspective and maybe it helps
to clarify which should be the proper way to go for acpi_os_sleep(). It
also identifies some problems with the current documentation and
implementation of msleep/fsleep.

Len Brown <lenb@xxxxxxxxxx> writes:

> Attempting to grab all the loose ends from this discussion
> and put them into a list of 4 things:
>
> 1. Is it time to re-visit Jon's proposal to fix msleep, especially for
> small sleep values?

Lets have a deeper look to msleep() internals: msleep() uses timer list
timers. Because of the design of the timer wheel (granularity of buckets
increases with the levels) and because of the granularity of jiffies the
sleep values will be longer as specified. Let's assume we are executing
a msleep(1) on a HZ=250 system:

First msecs are mapped on jiffies, so this results in a 4ms timeout
value, as there is nothing shorter than 1 jiffie. Then the jiffie value
is handed over to timer code and msleep() adds another jiffie to the
timeout. The timeout is then 2 jiffies or 8ms. With this timeout a timer
list timer is queued. To make sure that timers will not fire early or
race with a concurrent incrementation of jiffie, timers are queued
always into the next bucket. As the timer will end up in the first level
of the timer wheel the granularity of the buckets is 1 jiffies. This
means that the timeout would be 3 jiffies in worst case.

The additional jiffie in msleep() is the historical prevention that the
sleep time is at least the specified time. This is handled by timer
wheel core code itself, so this extra jiffie could be removed. I will
provide a patch for it.

But even with this change, the worst case sleep length will be 8ms
instead of 1ms.

For comparison, see here a table with the values for all the steps
explained above taking some different msleep values. I already dropped
the addition of the extra jiffie. The granularity of the timer wheel
levels can be found at the first long comment in kernel/time/timer.c.

This is still a HZ=250 system:

msleep()   | msecs_to_jiffies() | worst case delay after | timer wheel
           |                    | enqueue (next bucket)  | level
--------------------------------------------------------------------------
1 ms       | 1 jiffie           | 2 jiffies, 8 ms        | 0
256 ms     | 64 jiffies         | 72 jiffies, 288 ms     | 1
257 ms     | 65 jiffies         | 72 jiffies, 288 ms     | 1
2048 ms    | 513 jiffies        | 576 jiffies, 2304 ms   | 2
2300 ms    | 575 jiffies        | 576 jiffies, 2304 ms   | 2
4096 ms    | 1024 jiffies       | 1088 jiffies, 4352 ms  | 2

The same values on a HZ=1000 system:

msleep()   | msecs_to_jiffies() | worst case delay after | timer wheel
           |                    | enqueue (next bucket)  | level
--------------------------------------------------------------------------
1 ms       | 1 jiffie           | 2 jiffies, 2 ms        | 0
256 ms     | 256 jiffies        | 264 jiffies, 264 ms    | 1
257 ms     | 257 jiffies        | 264 jiffies, 264 ms    | 1
2048 ms    | 2048 jiffies       | 2112 jiffies, 2112 ms  | 2
2300 ms    | 2300 jiffies       | 2304 jiffies, 2304 ms  | 2
4096 ms    | 4096 jiffies       | 4608 jiffies, 4608 ms  | 3


The documentation states, that msleep is not valid for short sleep
values. But as demonstrated with the two tables, this is not enirely
true. So the descision whether msleep() is suitable for the usecase is
not as simple as documented. It depends on the HZ configuration in
combination with the timeout value, and on the request how precise the
timeout has to be. And another important point is: hrtimers are more
expensive then timer list timers...

The documentation was originally written back in 2010 where the non
cascading timer wheel wasn't in place yet. So is has to be updated.

> 2. We agree that usleep_range() is appropriate for short acpi_os_sleep()
> due to ASL loops with small Sleep() values.
> But if we do something different for large and small values,
> where is the line between small and large?

As pointed out above - this depends on HZ and what the requirements of
the callsite are.

> fsleep anointed 20ms, but didn't document why.
> (and it made both short sleeps *and* long sleeps too slow to be useful
> here, IMO)

fsleep() just implements what the outdated documentation states. And it
adds a magic max value for usleep_range(). It seems to me, that fsleep()
somehow accidentially found the way into the kernel in 2020. As it is
now there this needs to be fixed and should take the above things into
account in some generic way.

> 3. Is usleep_range(min, max) with min= max bad?
> If it is good, why is virtually nobody else using min=max?

It's not bad to use it. It depends on your use case. If you really need
the precise sleep length, then it should be valid to use min = max.

I hope the timer information will help to find the proper solution for
acpi_os_sleep().

Thanks,

	Anna-Maria





[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux