Re: SIMD accelerated crush_do_rule proof of concept

Vincent JARDIN <vincent.jardin@xxxxxxxxx> · Mon, 29 Aug 2016 16:08:55 +0200

Le 29/08/2016 à 15:55, Sage Weil a écrit :
To answer your question, the only real risk/problem I see is that we need
to keep the perfectly in sync with the non-optimized variant

I do propose a generic implementation that allows to share SIMD on ARM, 
Intel and others (Altivec),

https://github.com/dachary/ceph/commit/71ae4584d9ed57f70aad718d0ffe206a01e91fef

You can try the following,
For instance,
#include <stdint.h>
#include <immintrin.h>
{
__v32qi va, vb;
va = (__v32qi) { 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 
17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 4, 1, 0 };
vb = (__v32qi) { 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 
17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 };

__v32qi res = va ^ vb;
}

it will produce the optimized Neon or AVX, AVX2 according to each targets.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html