On 03/19/2015 10:32 PM, Jun Nie wrote: > 2015-03-19 0:08 GMT+08:00 Peter Hurley <peter@xxxxxxxxxxxxxxxxxx <mailto:peter@xxxxxxxxxxxxxxxxxx>>: > On 03/18/2015 04:01 AM, Jun Nie wrote: > > 2015-03-14 2:09 GMT+08:00 Russell King - ARM Linux <linux@xxxxxxxxxxxxxxxx <mailto:linux@xxxxxxxxxxxxxxxx> <mailto:linux@xxxxxxxxxxxxxxxx <mailto:linux@xxxxxxxxxxxxxxxx>>>: > > On Fri, Mar 13, 2015 at 02:02:49PM -0400, Peter Hurley wrote: > > > On 03/13/2015 01:46 PM, Russell King - ARM Linux wrote: > > > > On Fri, Mar 13, 2015 at 01:41:02PM -0400, Peter Hurley wrote: > > > >> I think this would be much cleaner keeping the existing register index > > > >> symbolic constants and using a remapping LUT with inline i/o accessors. > > > >> > > > >> Plus it has the advantage of disallowing certain indexes which are not > > > >> mapped (rather than an accidental alias). > > > >> > > > >> I realize that was not the original model set forth but I don't think > > > >> the original model contemplated a complete remap. > > > > > > > > In that case, we will want to ensure that we cache the data register > > > > and flag register iomem pointer in loops, so we don't have to > > > > constantly reload the base and offsets. > > > > > > > > This code gets run on some fairly slow ARM systems - including entirely > > > > FPGA based systems. > > > > > > I would the think the compiler would factor ptr calculations outside > > > any loop, but maybe not. > > > > It can't. IO accessors tend to have barriers, possibly calling out > > to functions, and the barriers force the compiler to reload almost > > everything which is not in a local variable. > > > > (Note that none of the accessors in amba-pl011.c needs to have these > > barriers, and the driver probably should've been converted to the > > relaxed accessors when they were merged.) > > > > Could you help elaborate IO accessors? Any example code for reference? > > By "i/o accessors", I mean wrapper functions that abstract the register i/o. > Generally, UART drivers handle significant model/platform i/o differences > with some functional abstraction, although the details vary. > > Some drivers assign and call through the struct uart_port::serial_in() and > ::serial_out() methods, but that's usually because the underlying > platform i/o is different (readl() vs. inb()). > > For this driver, what you outline below seems basically acceptable with the > following comments: > > > I am thinking below style code according to your comments. > > > > enum pl011_reg { > > UART01x_DR/* Data read or written from the interface. */ > > UART01x_RSR/* Receive status register (Read). */ > > UART01x_ECR= 1;/* Error clear register (Write). */ > > UART010_LCRH/* Line control register, high byte. */ > > UART010_LCRM/* Line control register, middle byte. */ > > ... > > }; > > I don't think it's necessary to re-define the register index > symbolic constants; the compiler will constant-fold expressions, > even through inline functions. For example, > > static unsigned int pl011_in(struct uart_port *port, int index) > { > return readl_relaxed(port->membase + lut[index << 2]; > } > > If function accept index as argument, then we have to calculate address with membase and index/offset in every loop. We need Russell's input here. Or my understanding is incorrect? I did some quick experiments with various different i/o methods and looked at the generated code, and the address is _already_ being reloaded and recomputed within every loop. I sent Russell a patch to fix this in the raw i/o accessors, so that if the driver uses relaxed accessors then the address computation will be hoisted from the loop (like it should be). Also, readX()/writeX() is equivalent to the relaxed accessors when !CONFIG_ARM_DMA_MEM_BUFFERABLE, so again the address computation will be hoisted from the loop. OTOH, if CONFIG_ARM_DMA_MEM_BUFFERABLE=y, then the address computation does remain within the loop (because of the barrier) unless hoisted manually, like void *addr = uap->port.membase + ....; while (readw(addr) & TXOFF) ; Honestly, I can't really see the need to hoist the address computation out manually no matter how the code is organized; the extra load + add will be dwarfed by the data barrier (and there is already an extra load in outer_sync() anyway). Regards, Peter Hurley -- To unsubscribe from this list: send the line "unsubscribe linux-serial" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html