Re: [PATCH v4 2/8] tty: serial: Add uart_fifo_timeout_ms()

Doug Anderson <dianders@xxxxxxxxxxxx> · Thu, 13 Jun 2024 07:02:25 -0700

Hi,

On Wed, Jun 12, 2024 at 11:56 PM Ilpo Järvinen
<ilpo.jarvinen@xxxxxxxxxxxxxxx> wrote:
>
> On Wed, 12 Jun 2024, Doug Anderson wrote:
> > On Wed, Jun 12, 2024 at 12:38 AM Ilpo Järvinen
> > <ilpo.jarvinen@xxxxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, 10 Jun 2024, Douglas Anderson wrote:
> > >
> > > > The current uart_fifo_timeout() returns jiffies, which is not always
> > > > the most convenient for callers. Add a variant uart_fifo_timeout_ms()
> > > > that returns the timeout in milliseconds.
> > > >
> > > > NOTES:
> > > > - msecs_to_jiffies() rounds up, unlike nsecs_to_jiffies(). This is
> > > >   because msecs_to_jiffies() is actually intended for device drivers
> > > >   to calculate timeout value. This means we don't need to take the max
> > > >   of the timeout and "1" since the timeout will always be > 0 ms (we
> > > >   add 20 ms of slop).
> > > > - uart_fifo_timeout_ms() returns "unsigned int" but we leave
> > > >   uart_fifo_timeout() returning "unsigned long". This matches the
> > > >   types of msecs_to_jiffies().
> > > >
> > > > Suggested-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
> > > > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> > > > ---
> > > >
> > > > Changes in v4:
> > > > - New
> > > >
> > > >  include/linux/serial_core.h | 15 +++++++++++----
> > > >  1 file changed, 11 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
> > > > index 8cb65f50e830..97968acfd564 100644
> > > > --- a/include/linux/serial_core.h
> > > > +++ b/include/linux/serial_core.h
> > > > @@ -889,14 +889,21 @@ unsigned int uart_get_divisor(struct uart_port *port, unsigned int baud);
> > > >  /*
> > > >   * Calculates FIFO drain time.
> > > >   */
> > > > -static inline unsigned long uart_fifo_timeout(struct uart_port *port)
> > > > +static inline unsigned int uart_fifo_timeout_ms(struct uart_port *port)
> > > >  {
> > > >       u64 fifo_timeout = (u64)READ_ONCE(port->frame_time) * port->fifosize;
> > > > +     unsigned int fifo_timeout_ms = div_u64(fifo_timeout, NSEC_PER_MSEC);
> > > >
> > > > -     /* Add .02 seconds of slop */
> > > > -     fifo_timeout += 20 * NSEC_PER_MSEC;
> > > > +     /*
> > > > +      * Add .02 seconds of slop. This also helps account for the fact that
> > > > +      * when we converted from ns to ms that we didn't round up.
> > > > +      */
> > > > +     return fifo_timeout_ms + 20;
> > > > +}
> > > >
> > > > -     return max(nsecs_to_jiffies(fifo_timeout), 1UL);
> > > > +static inline unsigned long uart_fifo_timeout(struct uart_port *port)
> > > > +{
> > > > +     return msecs_to_jiffies(uart_fifo_timeout_ms(port));
> > > >  }
> > >
> > > Hi,
> > >
> > > This is definitely towards the right direction! However, it now does
> > > double conversion, first div_u64() and then msecs_to_jiffies(). Perhaps it
> > > would be better to retain the nsecs version (maybe rename it to _ns for
> > > consistency) and add _ms variant that does the nsec -> msec conversion.
> >
> > I spent a bit of time thinking about it and I don't agree. If you feel
> > very strongly about it or someone else wants to jump in and break the
> > tie then I can look again, but:
> >
> > 1. The comment before nsecs_to_jiffies() specifically states that it's
> > not supposed to be used for this purpose. Specifically, it says:
> >
> >  * Unlike {m,u}secs_to_jiffies, type of input is not unsigned int but u64.
> >  * And this doesn't return MAX_JIFFY_OFFSET since this function is designed
> >  * for scheduler, not for use in device drivers to calculate timeout value.
> >
> > ...so switching away from nsecs_to_jiffies() to msecs_to_jiffies() is
> > arguably a "bugfix", or at least avoids using the API in a way that's
> > against the documentation.
>
> Okay, I see. However, there's no way around using u64 here even with your
> version that does not use nsecs_to_jiffies() because nsecs is the most
> useful form of input when starting from frame_time, usecs is a bit
> coarse-grained for higher data rates.

Right. We have to start with u64 because the frame time is in ns and
we can only fit ~4 seconds worth of ns in 32-bits. That seems iffy.

> > 2. As mentioned in the commit message, nsecs_to_jiffies() truncates
> > where msecs_to_jiffies() rounds up. Presumably this difference is
> > related to the comment above where the "ns" version is intended for
> > the scheduler. Using the "ms" version allows us to get rid of the
> > extra call to "max()" which is a benefit. Technically since the
> > timeout is at least 20 ms the minimum HZ is 100 I guess we didn't need
> > the max anyway, but I guess someone thought it was cleaner and now we
> > can definitely get rid of it.
> >
> > 3. These functions are inline anyway, so I don't think it's causing a
> > huge bloat of instructions. In fact, moving from 64-bit math to 32-bit
> > math sooner could make the code smaller.
> >
> > 4. I don't feel like it hurts the readability to convert down to ms
> > and then to jiffies. In fact, IMO it helps since it makes it more
> > obvious that we're working with ms.
>
> I'd be lying if I'd say I feel strongly about it

Fair enough. If someone wants to throw in an opinion and tiebreak then they can.

> but my only argument
> involves doing an extra divide which is somewhat costly. It's a
> plain 32-bit divide though so not as bad as the u64 one that is
> unavoidable.

We shouldn't be calling this in a loop anyway, so it's unlikely to
matter. In any case, I'd note that with the old code we had:

1. 64-bit multiply (time * fifosize)
2. 64-bit addition (result + 20ms)
3. 64-bit => 32-bit division (to jiffies)
4. 32-bit comparison against the value 1.
5. Conditional setting of the value to 1.

Now we have:

1. 64-bit multiply (time * fifosize)
2. 64-bit => 32-bit division (to ms)
3. 32-bit addition with a small immediate (20)
4. 32-bit addition (div round up) if HZ != 1000
5. 32-bit division (div round up) if HZ != 1000

I didn't try disassembling to see what the compiler did and it would
be different for each compiler / ISA / optimization level / value of
HZ, but I guess my point is that while we have one more divide (unless
HZ == 1000) we may have one less conditional. We're also tending to do
our math with small immediates which some ISAs can handle more
efficiently.

I think the real answer, though, is that this doesn't really matter
and that we should pick the solution that's cleaner/easier to
understand. I'm still in favor of the patch as it is. As I said, if
folks feel really strongly then it doesn't matter and I can change it,
but otherwise I'd rather keep it the way it is.

-Doug