Re: [CI 2/2] drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 12, 2016 at 03:30:55PM +0300, Ville Syrjälä wrote:
> On Fri, Aug 12, 2016 at 12:39:59PM +0100, Chris Wilson wrote:
> > +#ifdef CONFIG_AS_MOVNTDQA
> > +static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)
> > +{
> > +	kernel_fpu_begin();
> > +
> > +	len >>= 4;
> > +	while (len >= 4) {
> > +		asm("movntdqa   (%0), %%xmm0\n"
> > +		    "movntdqa 16(%0), %%xmm1\n"
> > +		    "movntdqa 32(%0), %%xmm2\n"
> > +		    "movntdqa 48(%0), %%xmm3\n"
> > +		    "movaps %%xmm0,   (%1)\n"
> > +		    "movaps %%xmm1, 16(%1)\n"
> > +		    "movaps %%xmm2, 32(%1)\n"
> > +		    "movaps %%xmm3, 48(%1)\n"
> 
> Not using sse2 movntdq for the store? No benefit or?

At least in the scenarios we, ok I, have in mind, leaving the dst in the
cache benefits us as we immediately process/move the data on.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux