Re: AVX - cast from YMM to XMM without overhead?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 24 Jun 2012, Dag Lem wrote:

Marc Glisse <marc.glisse@xxxxxxxx> writes:

On Sun, 24 Jun 2012, Dag Lem wrote:


[...]

Casting a vector to a vector with a different number of elements: I am
not sure what that is supposed to mean. Casting a pointer: I know this
means reinterpreting what is in memory there. In C++ you could also
consider casting to a reference to v4sf (equivalent to the pointer
cast) but I am not sure that it is supported yet (related to bug
53121).

I guess you're right. Casting as in my naïve approach would also have
limited usefulness in that it would only be able to extract the lower
(and not the higher) bits of a vector register. However I can't help
thinking that some kind of intrinsic/operator/whatever to extract
vector register parts would be useful.

But they do exist. You even criticized one for generating a superfluous mov and reported a bug against one that mysteriously started generating vshufps ;-)

How does one even specify the
following using pointer casts? Please try, and then tell me the result
is not ugly ;-)

 *z = (v2sf)d;  // Naïve attempt to generate vmovlps d, z

_mm_storel_pi(z,d);

or *z=*(v2sf*)&d; if you really want pointers. I agree it isn't that pretty, but it could be worse...

Ideally, the following would also generate the same code:
v2sf tmp={d[0],d[1]}; *z=tmp;

OK, so *((v4sf*)&a) is not supposed to be a no-op, but presently
isn't?

Strike the "not" in your sentence, and yes. It manages it for
*(float*)&a (taking the first element) but not for subvectors (yet).

Thank you, that's good news!

Although since I was the one who tried to fix it and I gave up, you may have to wait a bit for someone else to be motivated...


I have to say, I am impressed by the code generated for this at -O3:

#include <x86intrin.h>
__v2sf f(__v4sf x){
  __v2sf d=*(__v2sf*)&x;
  return d;
}

	movaps	%xmm0, -72(%rsp)
	movq	-72(%rsp), %rax
	movq	%rax, -80(%rsp)
	movlps	-80(%rsp), %xmm0

--
Marc Glisse


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux