On 01/07/16 13:54, Chris Wilson wrote: > > Whilst you are looking at this asm, note that we reload > boot_cpu_data.x86_cflush_size everytime around the loop. That's a small > but noticeable extra cost (especially when we are only flushing a single > cacheline). > I did notice that; I don't know if this is a compiler failure to do an obvious hoisting (which should then be merged with the other load) or a consequence of the volatile. It might be useful to report this to the gcc bugzilla to let them look at it. Either way, the diff looks good and you can add my: Acked-by: H. Peter Anvin <hpa@xxxxxxxxxxxxxxx> However, I see absolutely nothing wrong with the assembly other than minor optimization possibilities: since gcc generates an early-out test as well as a late-out test anyway, we could add an explicit: if (p < vend) return; before the first mb() at no additional cost (assuming gcc is smart enough to skip the second test; otherwise that can easily be done manually by replacing the for loop with a do { } while loop. I would be very interested in knowing if replacing the final clflushopt with a clflush would resolve your problems (in which case the last mb() shouldn't be necessary either.) Something like: void clflush_cache_range(void *vaddr, unsigned int size) { unsigned long clflush_size = boot_cpu_data.x86_clflush_size; char *vend = (char *)vaddr + size; char *p = (char *)((unsigned long)vaddr & ~(clflush_size - 1); if (p >= vend) return; mb(); for (; p < vend - clflush_size; p += clflush_size) clflushopt(p); clflush(p); /* Serializing and thus a barrier */ } _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel