On Fri, Aug 12, 2016 at 11:54:04AM +0100, Tvrtko Ursulin wrote: > On 12/08/16 07:25, akash.goel@xxxxxxxxx wrote: > >From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > >This patch provides the infrastructure for performing a 16-byte aligned > >read from WC memory using non-temporal instructions introduced with sse4.1. > >Using movntdqa we can bypass the CPU caches and read directly from memory > >and ignoring the page attributes set on the CPU PTE i.e. negating the > >impact of an otherwise UC access. Copying using movntqda from WC is almost > >as fast as reading from WB memory, modulo the possibility of both hitting > >the CPU cache or leaving the data in the CPU cache for the next consumer. > >(The CPU cache itself my be flushed for the region of the movntdqa and on > >later access the movntdqa reads from a separate internal buffer for the > >cacheline.) The write back to the memory is however cached. > > > >This will be used in later patches to accelerate accessing WC memory. > > > >v2: Report whether the accelerated copy is successful/possible. > >v3: Function alignment override was only necessary when using the > >function target("sse4.1") - which is not necessary for emitting movntdqa > >from __asm__. > >v4: Improve notes on CPU cache behaviour vs non-temporal stores. > >v5: Fix byte offsets for unrolled moves. > >v6: Find all remaining typos of movntqda, use kernel_fpu_begin. > > > >Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >Cc: Akash Goel <akash.goel@xxxxxxxxx> > >Cc: Damien Lespiau <damien.lespiau@xxxxxxxxx> > >Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > >Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> Picked up the 2 WC prep patches. Thanks for the review, testing and improvements, -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx