Hello Sergey, On Tue, Jan 17, 2023 at 04:22:51PM +0300, Sergey Organov wrote: > Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx> writes: > > On Fri, Jan 13, 2023 at 09:43:33PM +0300, Sergey Organov wrote: > >> Use readl() instead of heavier imx_uart_readl() in the Rx ISR, as we know > >> we read registers that must not be cached. > >> > >> Signed-off-by: Sergey Organov <sorganov@xxxxxxxxx> > >> --- > >> drivers/tty/serial/imx.c | 5 +++-- > >> 1 file changed, 3 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c > >> index be00362b8b67..f4236e8995fa 100644 > >> --- a/drivers/tty/serial/imx.c > >> +++ b/drivers/tty/serial/imx.c > >> @@ -890,14 +890,15 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id) > >> struct imx_port *sport = dev_id; > >> unsigned int rx, flg; > >> struct tty_port *port = &sport->port.state->port; > >> + typeof(sport->port.membase) membase = sport->port.membase; > >> u32 usr2; > >> > >> /* If we received something, check for 0xff flood */ > >> - usr2 = imx_uart_readl(sport, USR2); > >> + usr2 = readl(membase + USR2); > >> if (usr2 & USR2_RDR) > >> imx_uart_check_flood(sport, usr2); > >> > >> - while ((rx = imx_uart_readl(sport, URXD0)) & URXD_CHARRDY) { > >> + while ((rx = readl(membase + URXD0)) & URXD_CHARRDY) { > >> flg = TTY_NORMAL; > >> sport->port.icount.rx++; > > > > One of the motivations to introduce imx_uart_readl was to have a single > > place to add a debug output to be able to inspect what the driver is > > doing. > > > > I wonder where your need for higher speed comes from and if the compiler > > really generates more effective code with your change. > > Mostly it's because I'm obviously slowing things down a bit with the > patch to fight the flood, so I feel obliged to get things back on par > with the origin. Then, higher speed, let alone the time spent with > interrupts disabled and/or spinlocks taken, is always one of generic > goals for me. > > As for the generated code, with this patch I don't aim to affect code > generation, I rather avoid execution of part of existing code while > being on the most critical path. It should be quite obvious that not > executing some code is at least not slower than executing it. That's true, but I think it doesn't apply here. I would expect that the compiler "sees" for the call imx_uart_readl(sport, USR2) that the 2nd argument is constant and that for that value of offset the call is equivalent to readl(sport->port.membase + offset); So I doubt you're making anything quicker here. I tried the following patch on mainline (that is without the preceding patches in this series): diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index 757825edb0cd..cfc2f7057345 100644 --- a/drivers/tty/serial/imx.c +++ b/drivers/tty/serial/imx.c @@ -807,7 +807,7 @@ static irqreturn_t __imx_uart_rxint(int irq, void *dev_id) unsigned int rx, flg, ignored = 0; struct tty_port *port = &sport->port.state->port; - while (imx_uart_readl(sport, USR2) & USR2_RDR) { + while (readl(sport->port.membase + USR2) & USR2_RDR) { u32 usr2; flg = TTY_NORMAL; and the resulting code didn't change at all. For a bigger change (i.e. adding a variable for sport->port.membase and replacing two imx_uart_readl) the code changed quite a bit (it got 28 bytes bigger for imx_v6_v7_defconfig) and in the short time I tried I couldn't judge if the resulting code is better or not. So a change that explicitly doesn't execute the code that the compiler optimizes away anyhow isn't a win. Together with the fact that your patch makes register access use different idioms and so makes it harder to understand for a human I'd say the net benefit of your patch is negative. > > Please either drop the patch from your series or provide the differences > > the compiler produces and a benchmark. > > If your only objection against this patch is the desire to keep a single > place to add debug output, I'll be happy to tune the resulting code to > still have one. I don't see the need to optimize it. > That said, before we make a decision, could you please tell why register > shadows that the imx_uart_readl/writel are dealing with are needed in > the first place? It looks like all the registers that are shadowed are > readable as well. What's going on here, and if it happens to be a > speed-up, do we have any benchmarks? Not sure I did benchmarks back then, probably not. The main motivation was really to have that single access function. So I admit being guilty to have implemented an optimization without hard numbers just assuming that access to (cached) RAM is quicker than the register space. Best regards, Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | https://www.pengutronix.de/ |
Attachment:
signature.asc
Description: PGP signature