Re: [PATCH] ACPI: Remove msleep() bloat from acpi_os_sleep()

Anna-Maria Behnsen <anna-maria@xxxxxxxxxxxxx> · Thu, 29 Aug 2024 17:36:59 +0200

Hi,

"Rafael J. Wysocki" <rafael@xxxxxxxxxx> writes:

[...]

> The main difficulty is that acpi_os_sleep() really works on behalf of
> AML code (it is called when the Sleep() operator is evaluated in AML)
> and it is hard to say what behavior is intended there.
>
> We know that the msleep() precision is not sufficient at least in some
> cases (for example, a high-count loop in AML that sleeps for 5 ms in
> every iteration), but we don't really know what precision is needed.
>
> IMV it generally is reasonable to assume that firmware developers
> would not expect the exact sleep time (and the ACPI specification is
> not very precise regarding this either), but they wouldn't also expect
> sleep times much longer (in relative terms) than requested.  So
> roughly speaking they probably assume something between t and t + t/4,
> where t is the requested sleep time in milliseconds, regardless of the
> HZ value.
>
> Also, you said above that hrtimers were more expensive than the timer
> wheel timers, which we all agree with, but it is not clear to me what
> exactly the difference is.  For example, if there is a loop of 1000
> iterations in AML that each sleep for 5 ms, how much more overhead
> from using hrtimers would be there relative to using timer-wheel
> timers?

The general differences are:
- hrtimers are using a rbtree instead of hasing, which is more expensive
- in case the new hrtimer is the new first hrtimer, hardware has to be
  programmed, which is expensive too
- the timer wheel uses batching. hrtimers not. They might get an
  own interrupt for every hrtimer when they do not have the exact same
  expiry time.
- because of the above it is a lot more expensive to use hrtimers when
  hrtimers are cancelled and requeued with a new expiry time (like
  e.g. network timeouts do it)

It's hard to measure, because it depends... I briefly talked to Thomas
and his opinion is, that there is an overhead but it should be in a
range of ok.

> Generally speaking, Len's idea of using usleep_range() for all sleep
> lengths in acpi_os_sleep() is compelling because it leads to simple
> code, but it is not clear to me how much more it would cost relative
> to msleep() (or what issues it may provoke for that matter) and what
> delta value to use in there.  One idea is to derive the delta from the
> sleep length (say, take 1/4 of it like I did above), but the
> counter-argument is that this would cause the loops in AML in question
> to take measurably more time and there may be no real reason for it.

I created a patch for fsleep() - only complie tested - which should make
sure that the slack is maximum 25%. What do you think about it? Might it
be helpful?

---8<---

--- a/include/linux/delay.h
+++ b/include/linux/delay.h
@@ -78,15 +78,36 @@ static inline void ssleep(unsigned int s
 	msleep(seconds * 1000);
 }
 
-/* see Documentation/timers/timers-howto.rst for the thresholds */
+/**
+ * fsleep - flexible sleep which autoselects the best mechanism
+ * @usecs:	requested sleep duration in microseconds
+ *
+ * flseep() selects the best mechanism that will provide maximum 25% slack
+ * to the requested sleep duration. Therefore it uses:
+ *
+ * - udelay() loop for sleep durations <= 10 microseconds to avoid hrtimer
+ *   overhead for really short sleep durations.
+ *
+ * - usleep_range() for sleep durations which would lead with the usage of
+ *   msleep() to a slack larger than 25%. This depends on the granularity of
+ *   jiffies.
+ *
+ * - msleep() for all other sleep durations.
+ *
+ * Note: When CONFIG_HIGH_RES_TIMERS is not set, all sleeps are processed with
+ * the granularity of jiffies and the slack might exceed 25% especially for
+ * short sleep durations.
+ */
 static inline void fsleep(unsigned long usecs)
 {
+	const unsigned int max_slack_shift = 2;
+
 	if (usecs <= 10)
 		udelay(usecs);
-	else if (usecs <= 20000)
-		usleep_range(usecs, 2 * usecs);
+	else if (usecs < ((TICK_NSEC << max_slack_shift) / NSEC_PER_USEC))
+		usleep_range(usecs, usecs + (usecs >> max_slack_shift));
 	else
-		msleep(DIV_ROUND_UP(usecs, 1000));
+		msleep(DIV_ROUND_UP(usecs, USEC_PER_MSEC));
 }
 
 #endif /* defined(_LINUX_DELAY_H) */
---8<----

Thanks,

	Anna-Maria