On 2018-08-21 05:12, Maciej W. Rozycki wrote:
On Mon, 20 Aug 2018, Sinan Kaya wrote:
> Likewise see memory-barriers.txt throughout concerning `mmiowb' (which is
> an obviously lighter weight barrier compared to `readX').
Here is a better reference from memory-barriers.txt
(*) readX(), writeX():
Whether these are guaranteed to be fully ordered and uncombined
with
respect to each other on the issuing CPU depends on the
characteristics
defined for the memory window through which they're accessing.
On later
i386 architecture machines, for example, this is controlled by
way of the
MTRR registers.
Ordinarily, these will be guaranteed to be fully ordered and
uncombined,
provided they're not accessing a prefetchable device.
See the next sentence too, and I am concerned about the
"characteristics
defined for the memory window" qualification here -- how is the memory
window defined in the general sense? For i386 we have the MTRR
registers,
but how about other platforms?
Anyway, if we were to guarantee that `readX' and `writeX' were fully
ordered, then we would have to place barriers in matching places across
accessors, i.e. either before or after the actual MMIO access, but
uniformly across all of them, rather than having them mixed. Placing
them
beforehand is normally better as buffers will often have drained
already
by that time, meaning the performance cost of the barrier will be
lower.
As from commit commit 92d7223a7423 ("alpha: io: reorder barriers to
guarantee writeX() and iowriteX() ordering #2") we have barriers in
mixed
positions and placed beforehand and afterwards in write and read
accesses
respectively, meaning that if we issue say:
writel(x, foo);
y = readl(bar);
then the read from `bar' can be reordered ahead of the write to `foo',
which is very, very bad, breaking requirements set out across
io_ordering.txt and memory-barriers.txt. I am fairly sure this is the
cause of the regression observed.
The location of barrier is for observability with respect to memory
rather than other register accesses.
Now, i am used to arm where the architecture guarantees that register
accesses are ordered via devicd nGnRE.
If this architecture can reorder register accesses, then we need a
barrier before and after register access to make readX/writeX strongly
ordered.
Please confirm the arch behavior
You need to make a corresponding update to `readX' and `ioreadX' then
(and once that has been fixed we can consider the general matter of
MMIO
barriers independently).
Maciej