Re: How to stress test an RAID 6 array?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 3 Oct 2011, Marcin M. Jessa wrote:

Now I would like to stress test the array and see whether it would fail again or not. What would be the best way to do that?

Once upon a time I used to write test & diagnostic software for some custom designed (and rather big at the time) systems... One of the early things I learned was that no-matter what tests I thought of, some end-user would find some code that triggered some weird border-case that the test software would fail to find, so ... It's going to vary, depending on exactly what you're trying to test...

However, a soak-test I apply to all my servers goes as follows:

1. create a file of 2 x RAM size:

  dd if=/dev/urandom of=testfile00 bs=1M count=8192 # 8GB file, 4GB RAM

2. copy this file to another file and copy that to another, etc.

  cp -a testfile00 testfile01
  cp -a testfile01 testfile02

and so on. (obivously in a loop) Do this until the disk is full. (Use a file double RAM size to hopefully eliminate the effects of any Linux FS/block cache/buffering)

Compare md5checksums of testfile00 and testfileXX.

Using non blocking IO (dd with i/o flag=nonblock) might put more stress on the system by trying to overlap reads and writes, but I've not checked.

My aim here is to see if there have been any bit-errors during the disk fill operation - however it won't tell me where the error happened, nor what caused the error - memory, pci bus, sata cable, or something undetected by the disk. With multi-disk arrays, hopefully it'll be writing over all drives at once, but only reading from the active (non parity/mirror) drives.

(Is there some MD options that might force it to read all the drives all the time and do the parity check while in 'normal' use? Sure it might slow it down, but for soaktesting, it might be handy...)

I used to test memory like this way back in some old processors that had a block-move instruction - it ran the address/data bus as fast as it could which was a good test - my thoughts are that it's hopefully doing someting similar here... You could treat it like memory and put different patterns in the first block - all zeros, all ones, alternating ones and zeros, etc. depending on how you think the transfers are worst over the various buses - e.g. alternating 1010 might be bad on a serial bus, but will present the same patterns over a parallel bus, so if the bus is 8-bits wide, then alternating 0xFF, 0x00 (or (0xAA, 0x55) might be better - who knows.

I do know it takes a long time on todays very big disks )-:

My full soak test involves doing a Linux kernel compile at the same time as doing the above (in a loop - make -jX bzImage ; make clean ; repeat - X = numCpus), and doing some large FTPs to & from the box to make the network hardware work at the same time.

You can always throw in a burnMMX or burnBX at the same time for good measure...

Gordon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux