On Tue, Oct 13, 2015 at 12:10:45PM +0000, Woodruff, Richard wrote: > > From: Lucas Stach [mailto:l.stach@xxxxxxxxxxxxxx] > > Sent: Tuesday, October 13, 2015 5:01 AM > > > So please help me to get this straight: > > > > Errata I688 only affects OMAP4 which is consequently the only user of > > omap_interconnect_sync() in it's WFI enter sequence, which in turn is > > the only user of the SRAM scratch area to work around the erratum. > > > > The OMAP specific barrier implementation which should be used also on > > other SoCs does not need any SRAM scratch, but uses a part of DRAM to do > > the strongly ordered access. > > > > So it is safe to say that we only ever need to run the initcall > > allocating the SRAM scratch area on OMAP4. > > There are 2 separate things here. One is the bus sync function and the > other is the errata which requires a bus sync near WFI to avoid an errata. > > The rational for the bus sync is similar to why there is a writel() and a > writel_releaxed(). The bus sync has been used for a long time to ensure > writes have landed and are not stuck in some posting buffer on path. > > A lot of historical drivers use a writel() where perhaps they could choose > a more granular construct. If drivers were audited maybe the bus sync > could be minimized on writel() path. No, we're not going around that discussion loop again. Linux requirements are that writel() at the CPU should be ordered with respect to other writel()s and memory accesses which occur before the writel(). However, buffering of the write by down-stream busses is permitted, and where drivers want to ensure that the write has hit the device, a read-back must be performed. This requirement comes directly from the PCI specification, and is *not* actually something that is specific to Linux. Linux only adopts it from PCI. We're not going to ever relax these rules: if people want to perform accesses which do not conform to the above, they are free to - if they don't care about the timing of the write hitting the device, they can omit the read-back. If they don't care about the write being ordered, they can use writel_relaxed() (relaxed, because it doesn't have the ordering guarantees of standard writel().) It's up to the driver author to use the correct accessor(s) in their drivers. It's not for the architecture to decide that it can relax these rules (if it does, it risks breaking a load of drivers out there.) So, if people want to avoid the expensive OMAP bus sync on every access in their drivers, they _have_ to consider whether each load or store needs to be ordered. In general, a sequence of writes to a device should be implemented as a sequence of writel_relaxed(), and if it needs to be ordered, the last write should be a writel() or accompanied by a barrier. An example of this would be writing DMA controller configuration. All those writes should be writel_relaxed() except for the final writel() which kicks off the DMA operation. If you implement drivers using nothing but writel() and readl(), then your performance _will_ suck, but that's entirely the driver's fault. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html