On Mon, May 02, 2005 at 01:55:08AM +0200, peter fuerst wrote: > this question is posted here in the hope, it will be picked up and answered > by some of the <*@*engr.sgi.com> gurus, i apologize to the other members of > this mailing-list for annoying them with it as well ;-) They've sold their souls to the evil empire. > Is it save to assume, that memory bus errors (mc cpu_error_stat & 0x400) on > IP28 - due to R10k's precise exception model - can be asynchronous only when > caused by an aborted (misspeculated) instruction ? > The R10k manual, experiences with spurious bus errors and experiments with > "real" and speculated loads/stores seem to suggest this. > Moreover, could it be enough to recognize the bus error as asynchrounous, > when the exception code in cp0_cause doesn't say "Instruction bus error > exception" (6) or "Data bus..." (7), but "Interrupt" (0) ? (i.e. without > analyzing the instruction at epc and register contents) > > Rationale for this question: if a memory bus error can reliably be identified > as originating from a misspeculated memory access, it would be possible to get > rid of the myriads of cache barriers before *loads* (stores will remain > protected by cache barriers anyway) again, and spending some thousand machine > cycles on analyzing a bus error every three days of uptime is clearly more > efficient than having a cache barrier in kernel code every seventeen > instructions... Supposedly cache barrier instructions on the R10000 are relativly cheap but so far due to the lack of a need we haven't actually benchmarked that. As I recall the issue loads would still fetch the line from memory which in case of DMA buffers could result in stale data unless a cache flush is being performed after the DMA as well. Ralf