Re: Using Video cards (CUDA) for RAID parity

joystick <joystick@xxxxxxxxxxxxx> · Thu, 12 Dec 2013 18:51:40 +0100

On 12/12/2013 11:27, Pieter De Wit wrote:
Hi List,

Given the recent work done with techs like CUDA etc. - has the idea 
been floated to use the video card for RAID parity calculations vs the 
CPU ?

Sending the XOR computation to the GPU is like shooting a fly with a cannon.

The bandwidth to the GPU would be the bottleneck by 2 orders of 
magnitude if you try to do this.

XOR is a way too simple operation. Even if it was a stream of double * 
double multiplications, the bottleneck would lie in the bandwidth 
to/from the GPU.
You can gain something only if you do a matrix multiplication where each 
float or double is uploaded only once but reused many times in all the 
row x column multiplications.

The best performers on the GPU are the autoctonous applications, which 
operate autonomously and communicate very little with the CPU for a very 
long time.

The XOR computation is WAY fast enough on modern processors. There is a 
benchmark at boot about this:

dmesg | grep "raid6: using algorithm"

returns:

[    5.072162] raid6: using algorithm sse2x4 (7556 MB/s)

7.5 GB/sec, and that's raid6, not even XOR.
Probably even single-threaded.
(probably this does not include the memory-copy overhead)

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html