Re: AVX - cast from YMM to XMM without overhead?

Marc Glisse <marc.glisse@xxxxxxxx> · Sun, 24 Jun 2012 23:30:19 +0200 (CEST)

On Sun, 24 Jun 2012, Dag Lem wrote:

Marc Glisse <marc.glisse@xxxxxxxx> writes:

On Sun, 24 Jun 2012, Dag Lem wrote:

[...]

Casting a vector to a vector with a different number of elements: I am
not sure what that is supposed to mean. Casting a pointer: I know this
means reinterpreting what is in memory there. In C++ you could also
consider casting to a reference to v4sf (equivalent to the pointer
cast) but I am not sure that it is supported yet (related to bug
53121).

I guess you're right. Casting as in my naïve approach would also have
limited usefulness in that it would only be able to extract the lower
(and not the higher) bits of a vector register. However I can't help
thinking that some kind of intrinsic/operator/whatever to extract
vector register parts would be useful.

But they do exist. You even criticized one for generating a superfluous 
mov and reported a bug against one that mysteriously started generating 
vshufps ;-)

How does one even specify the
following using pointer casts? Please try, and then tell me the result
is not ugly ;-)

 *z = (v2sf)d;  // Naïve attempt to generate vmovlps d, z

_mm_storel_pi(z,d);

or *z=*(v2sf*)&d; if you really want pointers. I agree it isn't that 
pretty, but it could be worse...

Ideally, the following would also generate the same code:
v2sf tmp={d[0],d[1]}; *z=tmp;

OK, so *((v4sf*)&a) is not supposed to be a no-op, but presently
isn't?

Strike the "not" in your sentence, and yes. It manages it for
*(float*)&a (taking the first element) but not for subvectors (yet).

Thank you, that's good news!

Although since I was the one who tried to fix it and I gave up, you may 
have to wait a bit for someone else to be motivated...

I have to say, I am impressed by the code generated for this at -O3:

#include <x86intrin.h>
__v2sf f(__v4sf x){
  __v2sf d=*(__v2sf*)&x;
  return d;
}

	movaps	%xmm0, -72(%rsp)
	movq	-72(%rsp), %rax
	movq	%rax, -80(%rsp)
	movlps	-80(%rsp), %xmm0

--
Marc Glisse