Re: [PATCH v2 2/2] iopoll: Do not use timekeeping in read_poll_timeout_atomic()

Ulf Hansson <ulf.hansson@xxxxxxxxxx> · Fri, 12 May 2023 09:53:34 +0200

On Thu, 11 May 2023 at 14:44, Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
>
> Hi Ulf,
>
> On Thu, May 11, 2023 at 12:27 PM Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote:
> > On Wed, 10 May 2023 at 15:23, Geert Uytterhoeven
> > <geert+renesas@xxxxxxxxx> wrote:
> > > read_poll_timeout_atomic() uses ktime_get() to implement the timeout
> > > feature, just like its non-atomic counterpart.  However, there are
> > > several issues with this, due to its use in atomic contexts:
> > >
> > >   1. When called in the s2ram path (as typically done by clock or PM
> > >      domain drivers), timekeeping may be suspended, triggering the
> > >      WARN_ON(timekeeping_suspended) in ktime_get():
> > >
> > >         WARNING: CPU: 0 PID: 654 at kernel/time/timekeeping.c:843 ktime_get+0x28/0x78
> > >
> > >      Calling ktime_get_mono_fast_ns() instead of ktime_get() would get
> > >      rid of that warning.  However, that would break timeout handling,
> > >      as (at least on systems with an ARM architectured timer), the time
> > >      returned by ktime_get_mono_fast_ns() does not advance while
> > >      timekeeping is suspended.
> > >      Interestingly, (on the same ARM systems) the time returned by
> > >      ktime_get() does advance while timekeeping is suspended, despite
> > >      the warning.
> >
> > Interesting, looks like we should spend some time to further
> > investigate this behaviour.
>
> Probably, I was a bit surprised by this behavior, too.
>
> > >   2. Depending on the actual clock source, and especially before a
> > >      high-resolution clocksource (e.g. the ARM architectured timer)
> > >      becomes available, time may not advance in atomic contexts, thus
> > >      breaking timeout handling.
> > >
> > > Fix this by abandoning the idea that one can rely on timekeeping to
> > > implement timeout handling in all atomic contexts, and switch from a
> > > global time-based to a locally-estimated timeout handling.  In most
> > > (all?) cases the timeout condition is exceptional and an error
> > > condition, hence any additional delays due to underestimating wall clock
> > > time are irrelevant.
> >
> > I wonder if this isn't an oversimplification of the situation. Don't
> > we have timeout-error-conditions that we expected to happen quite
> > frequently?
>
> We may have some.  But they definitely do not happen when time
> does not advance, or they would have been mitigated long ago
> (the loop would never terminate).

Right, I was merely thinking of the case when ktime isn't suspended,
which of course is the most common case.

>
> > If so, in these cases, we really don't want to continue looping longer
> > than actually needed, as then we will remain in the atomic context
> > longer than necessary.
> >
> > I guess some information about how big these additional delays could
> > be, would help to understand better. Of course, it's not entirely easy
> > to get that data, but did you run some tests to see how this changes?
>
> I did some timings (when timekeeping is available), and the differences
> are rather minor.  The delay and timeout parameters are in µs, and
> 1 µs is already a few orders of magnitude larger than the cycle time
> of a contemporary CPU.

Ohh, I was certainly expecting a bigger spread. If it's in that
ballpark we should certainly be fine.

I will run some tests at my side too, as I am curious to see the
behaviour. I will let you know, whatever the result is, of course.

>
> Under-estimates are due to the time spent in op() (depends on the
> user, typical use is a hardware device register read), udelay()
> (architecture/platform-dependent accuracy), and general loop overhead.

Yes, you are right. My main concern is the accuracy of the udelay, but
I may be totally wrong here.

>
> > > Signed-off-by: Geert Uytterhoeven <geert+renesas@xxxxxxxxx>
> > > ---
> > > Alternatively, one could use a mixed approach (use both
> > > ktime_get_mono_fast_ns() and a local (under)estimate, and timeout on the
> > > earliest occasion), but I think that would complicate things without
> > > much gain.
> >
> > Another option could be to provide two different polling APIs for the
> > atomic use-case.
> >
> > One that keeps using ktime, which is more accurate and generally
> > favourable - and another, along the lines of what you propose, that
> > should be used by those that can't rely on timekeeping.
>
> At the risk of people picking the wrong one, leading to hard to
> find bugs?

I agree, If we don't need two APIs, it's certainly better to stick with one.

My main point is that we should not sacrifice "performance" for the
most common case, just to keep things simple, right?

Kind regards
Uffe