Re: [PATCH] drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18/07/16 12:35, Chris Wilson wrote:
On Mon, Jul 18, 2016 at 12:15:32PM +0100, Tvrtko Ursulin wrote:
I am not sure about this, but looking at the raid6 for example, it
has a lot more annotations in cases like this.

It seems to be telling the compiler which memory ranges does each
instruction access, and also uses "asm volatile" - whether or not
that is really needed I don't know.

For example:
                 asm volatile("movdqa %0,%%xmm4" :: "m" (dptr[z0][d]));

And:
                 asm volatile("movdqa %%xmm4,%0" : "=m" (q[d]));

Each one is telling the compiler the instruction is either reading
or writing respectively from a certain memory address.

You don't have any of that, and don't even specify nothing as an
output parameter so I am not sure if your code is safe.

The asm is correct. We do not modify either of the two pointers which we
pass in via register inputs, but the memory behind them - hence the memory
clobber.

This is a choice of how much we let the compiler decide about addressing, and how much we tell it about what the asm code really does. The examples above get the compiler to generate *any* suitable addressing mode for each specific location involved in the transfers, so the compiler knows a lot about what's happening and can track where each datum comes from and goes to.

OTOH Chris' code

+        asm("movntdqa   (%0), %%xmm0\n"
+            "movntdqa 16(%0), %%xmm1\n"
+            "movntdqa 32(%0), %%xmm2\n"
+            "movntdqa 48(%0), %%xmm3\n"
+            "movaps %%xmm0,   (%1)\n"
+            "movaps %%xmm1, 16(%1)\n"
+            "movaps %%xmm2, 32(%1)\n"
+            "movaps %%xmm3, 48(%1)\n"
+            :: "r" (src), "r" (dst) : "memory");

- doesn't need "volatile" because asm statements that have no output operands are implicitly volatile.

- makes the compiler give us the source and destination *addresses* in a register each; beyond that, it doesn't know what we're doing with them, so the third ("clobbers") parameter has to say "memory" i.e. treat *all* memory contents as unknown after this.

[[From GCC docs: The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the "memory" clobber effectively forms a read/write memory barrier for the compiler.]]

BTW, should we not tell it we've *also* clobbered %xmm[0-3]?

So they're both correct, just taking different approaches. I don't know which would give the best performance for this specific case.

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux