david@xxxxxxx wrote:
however I was addressing the point that for reads you can't do any
checking until you have read in all the blocks.
if you never check the consistency, how will it ever be proven otherwise.
A scheme often used is to mark the disk/slice as "clean" during clean
system shutdown (or RAID device shutdown). When it comes back up, it is
assumed clean. Why wouldn't it be clean?
However, if it comes up "unclean", this does indeed require an EXPENSIVE
resynchronization process. Note, however, that resynchronization usually
reads or writes all disks, whether RAID 1, RAID 5, RAID 6, or RAID 1+0.
My RAID 1+0 does a full resynchronization if shut down uncleanly. There
is nothing specific about RAID 5 here.
Now, technically - none of these RAID levels requires a full
resynchronization, even though it is almost always recommended and
performed by default. There is an option in Linux software RAID (mdadm)
to "skip" the resynchronization process. The danger here is that you
could read one of the blocks this minute and get one block, and read the
same block a different minute, and get a different block. This would
occur in RAID 1 if it did round-robin or disk with the nearest head to
the desired block, or whatever, and it made a different decision before
and after the minute. What is the worst that can happen though? Any
system that does careful journalling / synchronization should usually be
fine. The "risk" is similar to write caching without battery backing, in
that if the drive tells the system "write complete", and the system goes
on to perform other work, but the write is not complete, then corruption
becomes a possibility.
Anyways - point is again that RAID 5 is not special here.
but for your application, the fact that you are doing lots of fsyncs
is what's killing you, becouse the fsync forces a lot of data to be
written out, swamping the caches involved, and requiring that you wait
for seeks. nothing other then a battery backed disk cache of some sort
(either on the controller or a solid-state drive on a journaled
filesystem would work)
Yep. :-)
Cheers,
mark
--
Mark Mielke <mark@xxxxxxxxx>
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings