On Mon, Oct 18, 2010 at 6:44 AM, Shinya Kuribayashi <skuribay@xxxxxxxxx> wrote: > I suspect that SYNC insn alone is still not enough, insn't it? ÂIn > such systems with that 'deep' write buffer and data incoherency is > visibly observed, there sill may be data write transactions floating > in the internal bus system. > > To make sure that all data (data inside processor's write buffer and > data floating in the internal bus system), we need the following > three steps: > > 1. Flush data cache > 2. Uncached, dummy load operation from _DRAM_ (not somewhere else) > 3. then SYNC instruction Some systems do require additional steps along those lines, e.g. # ifdef CONFIG_SGI_IP28 # define fast_iob() \ __asm__ __volatile__( \ ".set push\n\t" \ ".set noreorder\n\t" \ "lw $0,%0\n\t" \ "sync\n\t" \ "lw $0,%0\n\t" \ ".set pop" \ : /* no output */ \ : "m" (*(int *)CKSEG1ADDR(0x1fa00004)) \ : "memory") Maybe it would be better to use iob() instead of __sync() directly, so that it is easy to add extra steps for the CPUs that need them. DEC and Loongson have custom __wbflush() implementations, and something similar could be added for your processor to implement the uncached dummy load. What do you think?