On Thu, Jan 7, 2016 at 2:16 AM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > On Mon, Oct 19, 2015 at 10:58:55AM +0100, Chris Wilson wrote: >> During testing we observed that the last cacheline was not being flushed >> from a >> >> mb() >> for (addr = addr & -clflush_size; addr < end; addr += clflush_size) >> clflushopt(); >> mb() >> >> loop (where the initial addr and end were not cacheline aligned). >> >> Changing the loop from addr < end to addr <= end, or replacing the >> clflushopt() with clflush() both fixed the testcase. Hinting that GCC >> was miscompling the assembly within the loop and specifically the >> alternative within clflushopt() was confusing the loop optimizer. >> >> Adding a barrier() into clflushopt() is enough for GCC to dtrt, but >> solving why GCC is not seeing the constraints from the alternative_io() >> would be smarter... >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92501 >> Testcase: gem_tiled_partial_pwrite_pread/read >> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> >> Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> >> Cc: H. Peter Anvin <hpa@xxxxxxxxxxxxxxx> >> Cc: Imre Deak <imre.deak@xxxxxxxxx> >> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> >> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx >> --- >> arch/x86/include/asm/special_insns.h | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h >> index 2270e41b32fd..0c7aedbf8930 100644 >> --- a/arch/x86/include/asm/special_insns.h >> +++ b/arch/x86/include/asm/special_insns.h >> @@ -199,6 +199,11 @@ static inline void clflushopt(volatile void *__p) >> ".byte 0x66; clflush %P0", >> X86_FEATURE_CLFLUSHOPT, >> "+m" (*(volatile char __force *)__p)); >> + /* GCC (4.9.1 and 5.2.1 at least) appears to be very confused when >> + * meeting this alternative() and demonstrably miscompiles loops >> + * iterating over clflushopts. >> + */ >> + barrier(); >> } > > Or an alternative: > > +#define alternative_output(oldinstr, newinstr, feature, output) \ > + asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) \ > + : output : "i" (0) : "memory") > > I would really appreciate some knowledgeable folks taking a look at the > asm for clflushopt() as it still affects today's kernel and gcc. > > Fwiw, I have confirmed that arch/x86/mm/pageattr.c clflush_cache_range() > is similarly affected. Unless I'm mis-reading the asm, clflush_cache_range() is compiled correctly for me. (I don't know what the %P is for in the asm, but that shouldn't matter.) The ALTERNATIVE shouldn't even be visible to the optimizer. Can you attach a bad .s file and let us know what gcc version this is? (You can usually do 'make foo/bar/baz.s' to get a .s file.) I'd also be curious whether changing clflushopt to clwb works around the issue. --Andy _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel