Linus, I have added you to this discussion in hope you can clear any uncertainty there might be about MMIO ordering properties of the Alpha; a question for you is towards the end of this e-mail. On Thu, 21 Feb 2019, Mikulas Patocka wrote: > Do you think that we should fix this by identifying hardware that needs > the delays and adding the delays there? First of all we need to correctly identify what is really going on there. And I've had a look at our arch/alpha/kernel/io.c and arch/alpha/include/asm/io.h as they stand now and AFAICT what they have is completely broken and if it works anywhere, then by pure luck only, e.g. because external bus accesses complete quickly enough for code execution not to progress to a subsequent external bus access. The thing is taking for example `ioreadX' and `iowriteX' we have `mb' before a write and `mb' after a read. So if we do say (addresses are arbitrary for illustration only): iowrite8(0, 1); ioread8(2); then these effectively expand to chipset-specific: mb(); foo_iowrite8(0, 1); foo_ioread8(2); mb(); Which means `foo_iowrite8' and `foo_ioread8' are unordered WRT each other as there is no barrier between them. Worse yet for: iowrite8(0, 1); ioread8(1); the `ioread8' operation may not even appear externally, as the CPU may peek the value in the CPU's write buffer while the write is still pending. A similar consideration can be done for `readX' and `writeX', as well as a mixture between the two kinds of these accesses. We do need to add a preceding `mb' to all the `*readX' accessors to satisfy the strong ordering requirement WRT any preceding `*writeX' access. We need that leading `mb' in `*_relaxed' accessors as well. And we need to alias `mmiowb' to `wmb' (unless it's removed as I've seen with Will's recent patches). NB, the 82378ZB SIO adds a minimum of 5 ISA clocks between back-to-back accesses from PCI to ISA, which AFAIU should be enough for the PC87332 Super I/O. More can be configured if required, but I doubt that is needed, because it wasn't for 20+ years. > In my opinion, adding mb() to the port accessing functions is safer - it > is 6 line patch. The amount of code does not matter as long as it is wrong. We need to get this right. > Reading all the hardware manuals is time consuming and hardly anyone will > do it for 25 years old hardware. I do this regularly and I'm fine doing it this time too, also for the value of the education gained. I'll see if I can experiment with my hardware too, but regrettably that'll have to wait a couple of months at the very least (due to Brexit :( -- I have left my Alpha in the UK and the very last weekend I literally headed off to stay away from all of the upcoming mess). Meanwhile here is what I have tracked down in the architecture manual[1]: "Access to local I/O space does not cause any implicit read/write ordering; explicit barrier instructions must be used to ensure any desired ordering. Barrier instructions must be used: * After updating a memory-resident data structure and before writing a local I/O space location to notify the device of the updates. * Between multiple consecutive direct accesses to local I/O space, e.g. device control registers, if those accesses are expected to be ordered at the device. Again, note that implementations may cache not only memory-resident data structures, but also local I/O space locations." -- which in my opinion makes it clear that we need these barriers that were removed with commit 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2"), in addition to ones added there (the `*_relaxed' accessors don't need the latter ones though). NB for some reason the whole chapter on I/O has been reduced in later revisions of the architecure manual[2][3] (as well as the shortened version of the book called the architecture handbook) to a single-page overview lacking any details. I think that does not make the architecture ordered any more strongly WRT MMIO though; instead it makes it even less specified. Linus, can you confirm or contradict my conclusions taken from the manual in the context of their later removal? You did the Alpha port back in mid-1990s (BTW, Jon 'maddog' spoke very fondly about that effort at FOSDEM earlier this month) and certainly had to know these details to complete it and you worked with people directly involved with the design of the Alpha at the time it was being developed, and given the somewhat terse formal specification you could therefore be the only person around I know of who can give a definite answer. I hope you still remember the bits. Questions, thoughts? References: [1] Richard L. Sites, ed., "Alpha Architecture Reference Manual", Digital Press, 1992, Chapter 8, "Input/Output (I)", Section 8.2.1 "Read/Write Ordering", <http://www.bitsavers.org/pdf/dec/alpha/Sites_AlphaArchitectureReferenceManual_1992.pdf>. [2] Richard L. Sites, Richard T. Witek, "Alpha AXP Architecture Reference Manual", Second Edition, Digital Press, 1995, Chapter 8 "Input/Output Overview (I)", <http://www.bitsavers.org/pdf/dec/alpha/Sites_AlphaAXPArchitectureReferenceManual_2ed_1995.pdf>. [3] "Alpha Architecture Reference Manual", Fourth Edition, Compaq Computer Corporation, January 2002, Chapter 8 "Input/Output Overview (I)", <https://download.majix.org/dec/alpha_arch_ref.pdf>. Maciej