On Sat, Jan 09, 2016 at 02:36:03PM -0800, Andy Lutomirski wrote: > On Sat, Jan 9, 2016 at 12:01 AM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Jan 07, 2016 at 02:32:23PM -0800, H. Peter Anvin wrote: > >> On 01/07/16 14:29, H. Peter Anvin wrote: > >> > > >> > I would be very interested in knowing if replacing the final clflushopt > >> > with a clflush would resolve your problems (in which case the last mb() > >> > shouldn't be necessary either.) > >> > > >> > >> Nevermind. CLFLUSH is not ordered with regards to CLFLUSHOPT to the > >> same cache line. > >> > >> Could you add a sync_cpu(); call to the end (can replace the final mb()) > >> and see if that helps your case? > > > > s/sync_cpu()/sync_core()/ > > > > No. I still see failures on Baytrail and Braswell (Pineview is not > > affected) with the final mb() replaced with sync_core(). I can reproduce > > failures on Pineview by tweaking the clflush_cache_range() parameters, > > so I am fairly confident that it is validating the current code. > > > > iirc sync_core() is cpuid, a heavy serialising instruction, an > > alternative to mfence. Is there anything that else I can infer about > > the nature of my bug from this result? > > No clue, but I don't know much about the underlying architecture. > > Can you try clflush_cache_ranging one cacheline less and then manually > doing clflushopt; mb on the last cache line, just to make sure that > the helper is really doing the right thing? You could also try > clflush instead of clflushopt to see if that makes a difference. I had looked at increasing the range over which clflush_cache_range() runs (using roundup/rounddown by cache lines), but it took something like +/- 256 bytes to pass all the tests. And also did s/clflushopt/clflush/ to confirm that made no differnce. Bizarrely, diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 6000ad7..cf074400 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -141,6 +141,7 @@ void clflush_cache_range(void *vaddr, unsigned int size) for (; p < vend; p += clflush_size) clflushopt(p); + clflushopt(vend-1); mb(); } EXPORT_SYMBOL_GPL(clflush_cache_range); works like a charm. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel