Re: [PATCH] ARM: OMAP2: erratum I688 handling disabled for AM335x

Russell King - ARM Linux <linux@xxxxxxxxxxxxxxxx> · Tue, 13 Oct 2015 13:24:13 +0100

On Tue, Oct 13, 2015 at 12:10:45PM +0000, Woodruff, Richard wrote:
> > From: Lucas Stach [mailto:l.stach@xxxxxxxxxxxxxx]
> > Sent: Tuesday, October 13, 2015 5:01 AM
> 
> > So please help me to get this straight:
> > 
> > Errata I688 only affects OMAP4 which is consequently the only user of
> > omap_interconnect_sync() in it's WFI enter sequence, which in turn is
> > the only user of the SRAM scratch area to work around the erratum.
> > 
> > The OMAP specific barrier implementation which should be used also on
> > other SoCs does not need any SRAM scratch, but uses a part of DRAM to do
> > the strongly ordered access.
> > 
> > So it is safe to say that we only ever need to run the initcall
> > allocating the SRAM scratch area on OMAP4.
> 
> There are 2 separate things here.  One is the bus sync function and the
> other is the errata which requires a bus sync near WFI to avoid an errata.
> 
> The rational for the bus sync is similar to why there is a writel() and a
> writel_releaxed().  The bus sync has been used for a long time to ensure
> writes have landed and are not stuck in some posting buffer on path.
> 
> A lot of historical drivers use a writel() where perhaps they could choose
> a more granular construct.  If drivers were audited maybe the bus sync
> could be minimized on writel() path.

No, we're not going around that discussion loop again.

Linux requirements are that writel() at the CPU should be ordered with
respect to other writel()s and memory accesses which occur before the
writel().

However, buffering of the write by down-stream busses is permitted, and
where drivers want to ensure that the write has hit the device, a
read-back must be performed.  This requirement comes directly from the
PCI specification, and is *not* actually something that is specific to
Linux.  Linux only adopts it from PCI.

We're not going to ever relax these rules: if people want to perform
accesses which do not conform to the above, they are free to - if they
don't care about the timing of the write hitting the device, they can
omit the read-back.  If they don't care about the write being ordered,
they can use writel_relaxed() (relaxed, because it doesn't have the
ordering guarantees of standard writel().)

It's up to the driver author to use the correct accessor(s) in their
drivers.  It's not for the architecture to decide that it can relax
these rules (if it does, it risks breaking a load of drivers out there.)

So, if people want to avoid the expensive OMAP bus sync on every access
in their drivers, they _have_ to consider whether each load or store
needs to be ordered.  In general, a sequence of writes to a device
should be implemented as a sequence of writel_relaxed(), and if it needs
to be ordered, the last write should be a writel() or accompanied by a
barrier.  An example of this would be writing DMA controller configuration.
All those writes should be writel_relaxed() except for the final writel()
which kicks off the DMA operation.

If you implement drivers using nothing but writel() and readl(), then your
performance _will_ suck, but that's entirely the driver's fault.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html