On Mon, 18 Oct 2010, Kevin Cernekee wrote: > Some systems do require additional steps along those lines, e.g. > > # ifdef CONFIG_SGI_IP28 > # define fast_iob() \ > __asm__ __volatile__( \ > ".set push\n\t" \ > ".set noreorder\n\t" \ > "lw $0,%0\n\t" \ > "sync\n\t" \ > "lw $0,%0\n\t" \ > ".set pop" \ > : /* no output */ \ > : "m" (*(int *)CKSEG1ADDR(0x1fa00004)) \ > : "memory") > > Maybe it would be better to use iob() instead of __sync() directly, so > that it is easy to add extra steps for the CPUs that need them. DEC > and Loongson have custom __wbflush() implementations, and something > similar could be added for your processor to implement the uncached > dummy load. Ah, the old issue of the write-back barrier. I can't comment on Loongson, but for DEC IIRC the write-back buffer only needs to be taken care of for uncached writes and they take a path separate to cached writes. I'd have to dig out the details to be sure. IIRC the most pathological case was the R2020 WB chip, but that was only used on systems that didn't do DMA (namely DECstatation 3100 and 2100 boxes). Maciej