On Sun, Aug 30, 2009 at 11:56 PM, Merlin Moncure<mmoncure@xxxxxxxxx> wrote: > 192k written > raid 10: six writes > raid 5: four writes, one read (but the read and one of the writes is > same physical location) > > now, by 'same physical' location, that may mean that the drive head > has to move if the data is not in cache. > > I realize that many raid 5 implementations tend to suck. That said, > raid 5 should offer higher theoretical performance for writing than > raid 10, both for sequential and random. In the above there are two problems. 1) 192kB is not a random access pattern. Any time you're writing a whole raid stripe or more then RAID5 can start performing reasonably but that's not random, that's sequential i/o. The relevant random i/o pattern is writing 8kB chunks at random offsets into a multi-terabyte storage which doesn't fit in cache. 2) It's not clear but I think you're saying "but the read and one of the writes is same physical location" on the basis that this mitigates the costs. In fact it's the worst case. It means after doing the read and calculating the parity block the drive must then spin a full rotation before being able to write it back out. So instead of an average latency of 1/2 of a rotation you have that plus a full rotation, or 3x as much latency before the write can be performed as without raid5. It's not a fault of the implementations, it's a fundamental problem with RAId5. Even a spectacular implementation of RAID5 will be awful for random access writes. The only saving grace some hardware implementations have is having huge amounts of battery backed cache which mean that they can usually buffer all the writes for long enough that the access patterns no longer look random. If you buffer enough then you can hope you'll eventually overwrite the whole stripe and can write out the new parity without reading the old data. Or failing that you can perform the reads of the old data when it's convenient because you're reading nearby data effectively turning it into sequential i/o. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance