Re: [PATCH] drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 18/07/16 12:35, Chris Wilson wrote:
On Mon, Jul 18, 2016 at 12:15:32PM +0100, Tvrtko Ursulin wrote:
I am not sure about this, but looking at the raid6 for example, it
has a lot more annotations in cases like this.

It seems to be telling the compiler which memory ranges does each
instruction access, and also uses "asm volatile" - whether or not
that is really needed I don't know.

For example:
                 asm volatile("movdqa %0,%%xmm4" :: "m" (dptr[z0][d]));

And:
                 asm volatile("movdqa %%xmm4,%0" : "=m" (q[d]));

Each one is telling the compiler the instruction is either reading
or writing respectively from a certain memory address.

You don't have any of that, and don't even specify nothing as an
output parameter so I am not sure if your code is safe.

The asm is correct. We do not modify either of the two pointers which we
pass in via register inputs, but the memory behind them - hence the memory
clobber.

So you are saying memory clobber is like a big hammer telling the compiler you are modifying some memory and it is not allowed to assume or remove anything?

There would be no benefit in being more specific, like the RAID code does?

+void i915_memcpy_init_early(struct drm_i915_private *dev_priv)
+{
+	if (static_cpu_has(X86_FEATURE_XMM4_1))
+		static_branch_enable(&has_movntdqa);
+}


I was not familiar with static key stuff and the only thing I can
notice is that it is used very little throughout the kernel. On the
other hand I haven't found any references in the documentation that
it should be used sparingly or something.

But the general question would be - is it worth it here? Static
branches should be really efficient in the off case, correct? And we
don't really care about the performance of the off case here. So
would it be just as good to use a normal branch?

It's not the cost of the branch, but the static_cpu_has() in comparison
to a small copy.

Could cache it in dev_priv. Not saying you should, just asking about pros and cons of the two approaches vs the amount of code. Well not even the amount of code, just the fact static keys seem to be used so little so wondering why.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux