Re: [PATCH igt] igt/gem_fence_thresh: Use streaming reads for verify

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Mon, 09 Oct 2017 14:56:57 +0100



Quoting Joonas Lahtinen (2017-10-09 14:36:27)
> Title: s/thresh/thrash/
> 
> On Wed, 2017-08-23 at 13:55 +0100, Chris Wilson wrote:
> > At the moment, the verify tests use an extremely brutal write-read of
> > every dword, degrading performance to UC. If we break those up into
> > cachelines, we can do a wcb write/read at a time instead, roughly 8x
> > faster. We lose the accuracy of the forced wcb flushes around every dword,
> > but we are retaining the overall behaviour of checking reads following
> > writes instead. To compensate, we do check that a single dword write/read
> > before using wcb aligned accesses.
> > 
> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> 
> <SNIP>
> 
> > @@ -104,15 +109,78 @@ bo_copy (void *_arg)
> >       return NULL;
> >  }
> >  
> > +#if defined(__x86_64__) && !defined(__clang__)
> > +#define MOVNT 512
> > +
> > +#pragma GCC push_options
> > +#pragma GCC target("sse4.1")
> > +
> > +#include <smmintrin.h>
> > +__attribute__((noinline))
> > +static void copy_wc_page(void *dst, void *src)
> > +{
> > +     if (igt_x86_features() & SSE4_1) {
> > +             __m128i *S = (__m128i *)src;
> > +             __m128i *D = (__m128i *)dst;
> > +
> > +             for (int i = 0; i < PAGE_SIZE/CACHELINE; i++) {
> > +                     __m128i tmp[4];
> > +
> > +                     tmp[0] = _mm_stream_load_si128(S++);
> > +                     tmp[1] = _mm_stream_load_si128(S++);
> > +                     tmp[2] = _mm_stream_load_si128(S++);
> > +                     tmp[3] = _mm_stream_load_si128(S++);
> > +
> > +                     _mm_store_si128(D++, tmp[0]);
> > +                     _mm_store_si128(D++, tmp[1]);
> > +                     _mm_store_si128(D++, tmp[2]);
> > +                     _mm_store_si128(D++, tmp[3]);
> > +             }
> > +     } else
> > +             memcpy(dst, src, PAGE_SIZE);
> > +}
> 
> Not lib/ material?

Yes. But you know it's easier to make it work for one case than all.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx