Hi Will, > > FAOD, I think this assumption/requirement only applies to the plain > > accessors (`inX', `readX', `ioreadX', etc.). > > It's also a requirement for the *_relaxed accessors, and there are drivers > that rely on this being the case. Well, from the reading of memory-barriers.txt, be it as it stands, or with your rewording applied, I take it the `*_relaxed' accessors do not guarantee ordering WRT locking or DMA and hence a trailing barrier in `readX_relaxed' is not necessary, and so we don't need a trailing `mb' with the Alpha port either (but we do need a leading `mb' there, as well as with `writeX_relaxed'). > > For performance reasons we may decide sometime to opt in for accessors > > that do not suffer from the requirement to be strongly ordered WRT each > > other, for the benefit to architectures that are not strongly ordered with > > MMIO and that suffer a lot from serialising accesses that do not really > > care, e.g. where you need to load a bunch of device registers or maybe > > even device RAM in any order before making a serialised final request to > > accept the values loaded. > > I'd expect accesses to device RAM to use something like ioremap_wc() if > possible. In that case, the ordering of accesses is weakened by the > underlying memory type in the page tables, but we're not yet at the point > where we've figured out the portable semantics in this case. I plan to > look at that once we've nailed normal ioremap()! Some CPU hardware and certainly many if not most MIPS implementations do not give such a finegrained control over mapping. I think all the MIPS CPUs I dealt with only gave the choice between a cached (coherent or not) and an uncached mapping, with the ordering being weak in all cases. So the only control over ordering was with explicit barrier instructions. > > That piece of hardware is however rather peculiar and not an example of > > the most common design seen nowadays and I am not sure if the extra > > maintenance burden across all the ports for any additional accessors would > > be outweighed by the benefit for the weakly ordered MMIO architectures > > (where an execution stall can indeed count in hundreds of clock cycles per > > barrier inserted) combined with the appreciation (i.e. actual use) level > > from driver writers who do not necessarily grok all that weak ordering > > business. > > If you use ioremap_wc() for device RAM and __raw_readX() for bunching up > register accesses, then I don't think we need to add anything new, do we? Well, `__raw_*' accessors are never byte-swapped, not at least with the MIPS port, making them a tad cumbersome for a driver that has no interest in paying attention to any endianness mismatch between the CPU bus and the device's peripheral bus. Granted this does not matter for this driver as it only has a chance to be used with DECstation (MIPS), DEC 3000 AXP (Alpha) and VAXstation 4000 (VAX) hardware, all of which are little-endian throughout (and not all of which we have support for upstream). Still I'd consider it a hack as obviously we do have drivers for options living on say PCI, which is little-endian, that we want to use with processors configured for the big endianness with their FSB. Maciej