On 2019-12-16 08:16 -0500, Chris Elrod wrote: > I'm not the asked, but I would strongly prefer m256 if code could be > generated masking the unused lane for safe loads/stores, at least on > architectures where this is efficient (eg, Skylake-X). > This automatic masking would make writing SIMD code easier when you > don't > have powers of 2, by saving the effort of passing the bitmask to each > operation (which is at least an option with intrimin.h, not sure > about > GCC's built-in). I perfer m256 too. I'm already using vector_size(4*sizeof(double)) for some calculation in 3D euclid space (only 3 elements are really used). > However, if the asker doesn't want this for SIMD code, but wants a > convenient vector to index for scalar code, I'd recommend defining > your own > class. Indexing SIMD vectors is inefficient, and it may interfere > with > optimizations like SROA. But I could be wrong; my experience is > mostly with > Julia which uses LLVM. GCC may do better. I want SIMD code and I don't need much indexing. But just curious, why indexing SIMD vectors is inefficient? -- Xi Ruoyao <xry111@xxxxxxxxxxxxxxxx> School of Aerospace Science and Technology, Xidian University