On Sun, 10 May 2020, Mikulas Patocka wrote: > > > That's what we do on some other architectures to emulate the non-posted > > > behavior of out[bwl], as required by PCI. I can't think of any reasons to > > > have a barrier before in[bwl], or after write[bwl], but we generally want > > > one after out[bwl] > > > > Alpha is weakly ordered, also WRT MMIO. The details are a bit obscure > > (and were discussed before in a previous iteration of these patches), but > > my understanding is multiple writes can be merged and writes can be > > reordered WRT reads, even on UP. It's generally better for performance to > > We discussed it some times ago, and the conclusion was that reads and > writes to the same device are not reordered on Alpha. Reads and writes to > different devices or to memory may be reordered. Except that "device" in this context is a particular MMIO location; most peripherals span multiple locations, including the RTC and the UART in the Avanti line in particular. > In these problematic cases, we only access serial port or real time clock > using a few ports (and these devices don't have DMA, so there's not any > interaction with memory) - so I conclude that it is timing problem and not > I/O reordering problem. Individual PCI port locations correspond to different MMIO locations, so yes, accesses to these can be reordered (merging won't happen due to the use of the sparse address space). As I noted using a small program to verify actual behaviour ought to reveal what the problem really is. And /dev/mem can be mmapped for PCI port I/O access on Alpha (some X-servers do this), so it can be done even in the userland with a running system. And if timing is indeed the culprit, then I think it will be best fixed in the 82378IB southbridge, i.e.[1]: "The I/O recovery mechanism in the SIO is used to add additional recovery delay between PCI originated 8-bit and 16-bit I/O cycles to the ISA Bus. The SIO automatically forces a minimum delay of four SYSCLKs between back-to-back 8 and 16 bit I/O cycles to the ISA Bus. The delay is measured from the rising edge of the I/O command (IOR# or IOW#) to the falling edge of the next BALE. If a delay of greater than four SYSCLKs is required, the ISA I/O Recovery Time Register can be programmed to increase the delay in increments of SYSCLKs. Note that no additional delay is inserted for back-to-back I/O "sub cycles" generated as a result of byte assembly or disassembly. This register defaults to 8 and 16-bit recovery enabled with two clocks added to the standard I/O recovery." where it won't be causing unnecessary overhead for native PCI devices or indeed excessive one for ISA devices. It might be interesting to note that later SIO versions like the 82378ZB increased the minimum to five SYSCLKs, so maybe a missing SYSCLK (that can still be inserted by suitably programming the ICRT) is the source of the problem? References: [1] "82378IB System I/O (SIO)", April 1993, Intel Corporation, Order Number: 290473-002, Section 4.1.17 "ICRT -- ISA Controller Recovery Timer Register" Maciej