On 13 July 2017 at 18:51, Markus Stockhausen <stockhausen@xxxxxxxxxxx> wrote: >> Von: Ard Biesheuvel [ard.biesheuvel@xxxxxxxxxx] >> Gesendet: Donnerstag, 13. Juli 2017 19:16 >> An: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-raid@xxxxxxxxxxxxxxx >> Cc: shli@xxxxxxxxxx; Markus Stockhausen; linux@xxxxxxxxxxxxxxx; will.deacon@xxxxxxx; catalin.marinas@xxxxxxx; Ard Biesheuvel >> Betreff: [PATCH 1/2] md/raid6: use faster multiplication for ARM NEON delta syndrome >> >> The P/Q left side optimization in the delta syndrome simply involves >> repeatedly multiplying a value by polynomial 'x' in GF(2^8). Given >> that 'x * x * x * x' equals 'x^4' even in the polynomial world, we >> can accelerate this substantially by performing up to 4 such operations >> at once, using the NEON instructions for polynomial multiplication. >> >> Results on a Cortex-A57 running in 64-bit mode: >> >> Before: >> ------- >> raid6: neonx1 xor() 1680 MB/s >> raid6: neonx2 xor() 2286 MB/s >> raid6: neonx4 xor() 3162 MB/s >> raid6: neonx8 xor() 3389 MB/s >> >> After: >> ------ >> raid6: neonx1 xor() 2281 MB/s >> raid6: neonx2 xor() 3362 MB/s >> raid6: neonx4 xor() 3787 MB/s >> raid6: neonx8 xor() 4239 MB/s > > Nice optimiziation. Nevertheless the test algorithm favours this implementation. See: > > int start = (disks>>1)-1, stop = disks-3; /* work on the second half of the disks */ > > What gives the before/after test if you work on the middle data disks and not on > the right ones? In the 4K page size this should be start = 3, stop = 11 instead of > start = 7, stop = 13. Given the large gain you see the impact should be lower but > at least in the >10% range. > Relative before and after (using raid6test rather than the kernel module this time, so they should not be compared with the numbers above) before raid6: neonx1 xor() 1773 MB/s raid6: neonx2 xor() 2362 MB/s raid6: neonx4 xor() 3223 MB/s raid6: neonx8 xor() 3375 MB/s after raid6: neonx1 xor() 2259 MB/s raid6: neonx2 xor() 2975 MB/s raid6: neonx4 xor() 3404 MB/s raid6: neonx8 xor() 3788 MB/s So your estimate is correct: 12% speedup for neonx8 in the 'start = 7, stop = 13' case -- Ard. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html