On Fri, Aug 12, 2016 at 03:30:55PM +0300, Ville Syrjälä wrote: > On Fri, Aug 12, 2016 at 12:39:59PM +0100, Chris Wilson wrote: > > +#ifdef CONFIG_AS_MOVNTDQA > > +static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len) > > +{ > > + kernel_fpu_begin(); > > + > > + len >>= 4; > > + while (len >= 4) { > > + asm("movntdqa (%0), %%xmm0\n" > > + "movntdqa 16(%0), %%xmm1\n" > > + "movntdqa 32(%0), %%xmm2\n" > > + "movntdqa 48(%0), %%xmm3\n" > > + "movaps %%xmm0, (%1)\n" > > + "movaps %%xmm1, 16(%1)\n" > > + "movaps %%xmm2, 32(%1)\n" > > + "movaps %%xmm3, 48(%1)\n" > > Not using sse2 movntdq for the store? No benefit or? At least in the scenarios we, ok I, have in mind, leaving the dst in the cache benefits us as we immediately process/move the data on. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx