Re: SIMD accelerated crush_do_rule proof of concept

Loic Dachary <loic@xxxxxxxxxxx> · Mon, 29 Aug 2016 16:54:11 +0200

Hi Vincent,

On 29/08/2016 16:08, Vincent JARDIN wrote:
> Le 29/08/2016 à 15:55, Sage Weil a écrit :
>> To answer your question, the only real risk/problem I see is that we need
>> to keep the perfectly in sync with the non-optimized variant
> 
> I do propose a generic implementation that allows to share SIMD on ARM, Intel and others (Altivec),
> 
> 
> https://github.com/dachary/ceph/commit/71ae4584d9ed57f70aad718d0ffe206a01e91fef
> 
> You can try the following,
> For instance,
> #include <stdint.h>
> #include <immintrin.h>
> {
> __v32qi va, vb;
> va = (__v32qi) { 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 4, 1, 0 };
> vb = (__v32qi) { 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 };
> 
> __v32qi res = va ^ vb;
> }
> 
> it will produce the optimized Neon or AVX, AVX2 according to each targets.

Generic code that relies on the compiler optimizations is terse, which is nice. But the code is not generic: it needs to be written specifically for the optimizer, which is self defeating. The http://locklessinc.com/articles/vectorize/ article illustrate that in a fun way. Instead of maintaining code with SIMD instructions, you need to understand each optimizer by reading assembly language, which is more complicated.

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html