Re: Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130

Arnd Bergmann <arnd@xxxxxxxx> · Wed, 22 Aug 2018 18:06:51 +0200

On Wed, Aug 22, 2018 at 5:50 PM Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:
> On Wed, 22 Aug 2018, Maciej W. Rozycki wrote:
> > On Wed, 22 Aug 2018, Sinan Kaya wrote:
>
> According to the Alpha handbook, non-overlapping accesses may be
> reordered.
>
> So if someone does
> writel(REG1);
> readl(REG2);
>
> readl may (according to the spec) reach the device before writel. Although
> actual experiments suggests that the read flushes the queued writes.
>
> I would be quite interested why did Linux developers decide that readl
> should be implemented as "read+barrier" and writel should be implemented
> as "barrier+write". Why is there this assymetry in the barriers?

I can explain this part: those two barriers are used specifically do order
an MMIO access against a DMA access: a writel() may be used to start
a DMA operation copying data from RAM to the device, so we must
have a barrier between the store to that data and the store to the register
to ensure the data is visible to the device.
Similarly, a readl() may check the status of a register that tells us when
a DMA from device to RAM has completed. We must have a read
barrier between that mmio load and the load from RAM to prevent
the data to be prefetched while the MMIO is still in progress.

> Does ARM have some hardware magic that prevents reordering the write and
> the read in this case?

Most architecture have this AFAICT, ARM and x86 definitely do, and
PCI requires this to be true on the bus:

All MMIO accesses from a given CPU to a given device (according
to an architecture-specific definition of "device") are ordered with respect
to one another.

If the hardware does not guarantee that, for simple load/store operations
on uncached device memory, then we need a full barrier after each store
in addition to the write barrier needed for the DMA synchronization.

      Arnd