RE: Disk Elevator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



Quoting "Ross S. W. Walker" <rwalker@xxxxxxxxxxxxx>:

BTW that parity chunk still needs to be in memory to avoid the read on
it, no? In that case wouldn't a stride of 64 help in that case? And if
the stride leaves out the parity chunk then will not successive
read-aheads cause a continuous wrap of the stripe which will negate the
effect of the stride by not having the complete stripe cached?

Hm, not really. The parity chunk is never handed over to the OS. It's internal to the hardware RAID controller. OS doesn't know anything about it, it doesn't even know that the "disk" it is accessing is actually RAID5 array.

Back to your example of 4 disk RAID5, 64k chunks, 4k file system blocks.

If you set stride to 48, OS gives 3 chunks worth of data to the controller, aligned with stripes. Controller calculates parity and writes out 4 chunks (3 data, 1 parity).

If you set stride to 64, OS gives 4 chunks worth of data to the controller. In best case scenario first or last three will be aligned with stripes. Controller calculates parity on 3 of them, writes out 4 chunks (3 data, 1 parity). For the remaining data chunk, it needs to read 2 chunks from the disk, calculates parity and writes 2 chunks (1 data, 1 parity). In worst case scenario first or last 3 chunks will not be aligned with stripes. Controller reads 1 chunk, calculates parity writes out 3 chunks (2 data, 1 parity), than does the same thing again for remaining 2 chunks of data.

Anyhow, for large sequential reads and writes there's really not a big performace benefit (if any). OS will tend to combine and rearrange reads and writes to be sequential, and the hardware RAID controller will do the same using its cache. I've tested this once with good RAID controller, and bonnie++ (which benchmarks this kind of access) gave almost the same numbers with and without using stride option.

If disk access is random, read block here, write block there, there might be some benefit (however, cache in hardware RAID controller might kick in and save the day here too). It all depends on particular RAID contoller, workload and amount and type (write back vs. write through) of cache on the controller.

I'd say in most cases using stride option has very little effect if you have a large battery backed up write back cache (and good RAID controller, that is). If you are using software RAID, or have small and/or write through cache, stride option might have some effects.


_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos


[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux