Re: [PATCH] uart: pl011: Add support to ZTE uart

Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> · Sat, 21 Mar 2015 14:30:10 -0400

On 03/19/2015 10:32 PM, Jun Nie wrote:
> 2015-03-19 0:08 GMT+08:00 Peter Hurley <peter@xxxxxxxxxxxxxxxxxx <mailto:peter@xxxxxxxxxxxxxxxxxx>>:
>     On 03/18/2015 04:01 AM, Jun Nie wrote:
>     > 2015-03-14 2:09 GMT+08:00 Russell King - ARM Linux <linux@xxxxxxxxxxxxxxxx <mailto:linux@xxxxxxxxxxxxxxxx> <mailto:linux@xxxxxxxxxxxxxxxx <mailto:linux@xxxxxxxxxxxxxxxx>>>:
>     >     On Fri, Mar 13, 2015 at 02:02:49PM -0400, Peter Hurley wrote:
>     >     > On 03/13/2015 01:46 PM, Russell King - ARM Linux wrote:
>     >     > > On Fri, Mar 13, 2015 at 01:41:02PM -0400, Peter Hurley wrote:
>     >     > >> I think this would be much cleaner keeping the existing register index
>     >     > >> symbolic constants and using a remapping LUT with inline i/o accessors.
>     >     > >>
>     >     > >> Plus it has the advantage of disallowing certain indexes which are not
>     >     > >> mapped (rather than an accidental alias).
>     >     > >>
>     >     > >> I realize that was not the original model set forth but I don't think
>     >     > >> the original model contemplated a complete remap.
>     >     > >
>     >     > > In that case, we will want to ensure that we cache the data register
>     >     > > and flag register iomem pointer in loops, so we don't have to
>     >     > > constantly reload the base and offsets.
>     >     > >
>     >     > > This code gets run on some fairly slow ARM systems - including entirely
>     >     > > FPGA based systems.
>     >     >
>     >     > I would the think the compiler would factor ptr calculations outside
>     >     > any loop, but maybe not.
>     >
>     >     It can't.  IO accessors tend to have barriers, possibly calling out
>     >     to functions, and the barriers force the compiler to reload almost
>     >     everything which is not in a local variable.
>     >
>     >     (Note that none of the accessors in amba-pl011.c needs to have these
>     >     barriers, and the driver probably should've been converted to the
>     >     relaxed accessors when they were merged.)
>     >
>     > Could you help elaborate IO accessors? Any example code for reference?
>
>     By "i/o accessors", I mean wrapper functions that abstract the register i/o.
>     Generally, UART drivers handle significant model/platform i/o differences
>     with some functional abstraction, although the details vary.
>
>     Some drivers assign and call through the struct uart_port::serial_in() and
>     ::serial_out() methods, but that's usually because the underlying
>     platform i/o is different (readl() vs. inb()).
>
>     For this driver, what you outline below seems basically acceptable with the
>     following comments:
>
>     > I am thinking below style code according to your comments.
>     >
>     > enum pl011_reg {
>     > UART01x_DR/* Data read or written from the interface. */
>     > UART01x_RSR/* Receive status register (Read). */
>     > UART01x_ECR= 1;/* Error clear register (Write). */
>     > UART010_LCRH/* Line control register, high byte. */
>     > UART010_LCRM/* Line control register, middle byte. */
>     > ...
>     > };
>
>     I don't think it's necessary to re-define the register index
>     symbolic constants; the compiler will constant-fold expressions,
>     even through inline functions. For example,
>
>     static unsigned int pl011_in(struct uart_port *port, int index)
>     {
>             return readl_relaxed(port->membase + lut[index << 2];
>     }
>
> If function accept index as argument, then we have to calculate address with membase and index/offset in every loop. We need Russell's input here. Or my understanding is incorrect?

I did some quick experiments with various different i/o methods and
looked at the generated code, and the address is _already_ being
reloaded and recomputed within every loop.

I sent Russell a patch to fix this in the raw i/o accessors, so that
if the driver uses relaxed accessors then the address computation will
be hoisted from the loop (like it should be).

Also, readX()/writeX() is equivalent to the relaxed accessors when
!CONFIG_ARM_DMA_MEM_BUFFERABLE, so again the address computation will
be hoisted from the loop.

OTOH, if CONFIG_ARM_DMA_MEM_BUFFERABLE=y, then the address computation
does remain within the loop (because of the barrier) unless hoisted
manually, like

	void *addr = uap->port.membase + ....;

	while (readw(addr) & TXOFF)
		;

Honestly, I can't really see the need to hoist the address computation
out manually no matter how the code is organized; the extra load + add
will be dwarfed by the data barrier (and there is already an extra
load in outer_sync() anyway).

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html