At 04:54 PM 12/24/2005, David Lang wrote:
raid 5 is bad for random writes as you state, but how does it do for
sequential writes (for example data mining where you do a large
import at one time, but seldom do other updates). I'm assuming a
controller with a reasonable amount of battery-backed cache.
The issue with RAID 5 writes centers on the need to recalculate
checksums for the ECC blocks distributed across the array and then
write the new ones to physical media.
Caches help, and the bigger the cache the better, but once you are
doing enough writes fast enough (and that doesn't take much even with
a few GBs of cache) the recalculate-checksums-and-write-new-ones
overhead will decrease the write speed of real data. Bear in mind
that the HD's _raw_ write speed hasn't been decreased. Those HD's
are pounding away as fast as they can for you. Your _effective_ or
_data level_ write speed is what decreases due to overhead.
Side Note: people often forget the other big reason to use RAID 10
over RAID 5. RAID 5 is always only 2 HD failures from data
loss. RAID 10 can lose up to 1/2 the HD's in the array w/o data loss
unless you get unlucky and lose both members of a RAID 1 set.
This can be seen as an example of the classic space vs. time trade
off in performance tuning. You can use 2x the HDs you need and
implement RAID 10 for best performance and reliability or you can
dedicate less HD's to RAID and implement RAID 5 for less (write)
performance and lower reliability.
TANSTAAFL.
Ron Peacetree