On Tue, 2006-10-10 at 07:33 +1000, Neil Brown wrote: > On Monday October 9, dledford@xxxxxxxxxx wrote: > > > > The original email was about raid1 and the fact that reads from > > different disks could return different data. > > To be fair, the original mail didn't mention "raid1" at all. It did > mention raid5 and raid6 as a possible contrast so you could reasonably > get the impression that it was talking about raid1. But that wasn't > stated. OK, well I got that impression from the contrast ;-) > Otherwise I agree. There is no real need to perform the sync of a > raid1 at creation. > However it seems to be a good idea to regularly 'check' an array to > make sure that all blocks on all disks get read to find sleeping bad > blocks early. If you didn't sync first, then every check will find > lots of errors. Ofcourse you could 'repair' instead of 'check'. Or > do that once. Or something. > > For raid6 it is also safe to not sync first, though with the same > caveat as raid1. Raid6 always updates parity by reading all blocks in > the stripe that aren't known and calculating P and Q. So the first > write to a stripe will make P and Q correct for that stripe. > This is current behaviour. I don't think I can guarantee it will > never changed. > > For raid5 it is NOT safe to skip the initial sync. It is possible for > all updates to be "read-modify-write" updates which assume the parity > is correct. If it is wrong, it stays wrong. Then when you lose a > drive, the parity blocks are wrong so the data you recover using them > is wrong. superblock->init_flag == FALSE then make all writes a parity generating not updating write (less efficient, so you would want to resync the array and clear this up soon, but possible). > In summary, it is safe to use --assume-clean on a raid1 or raid1o, > though I would recommend a "repair" before too long. For other raid > levels it is best avoided. > > > > > Probably the best thing to do would be on create of the array, setup a > > large all 0 block of mem and repeatedly write that to all blocks in the > > array devices except parity blocks and use a large all 1 block for that. > > No, you would want 0 for the parity block too. 0 + 0 = 0. Sorry, I was thinking odd parity. > > Then you could just write the entire array at blinding speed. You could > > call that the "quick-init" option or something. You wouldn't be able to > > use the array until it was done, but it would be quick. > > I doubt you would notice it being faster than the current > resync/recovery that happens on creation. We go at device-speed - > either the buss device or the storage device depending on which is > slower. There's memory overhead though. That can impact other operations the cpu might do while in the process of recovering. > > > If you wanted > > to be *really* fast, at least for SCSI drives you could write one large > > chunk of 0's and one large chunk of 1's at the first parity block, then > > use the SCSI COPY command to copy the 0 chunk everywhere it needs to go, > > and likewise for the parity chunk, and avoid transferring the data over > > the SCSI bus more than once. > > Yes, that might be measurably faster. It is the sort of thing you might > do in a "hardware" RAID controller but I doubt it would ever get done > in md (there is a price for being very general). Bleh...sometimes I really dislike always making things cater to the lowest common denominator...you're never as good as you could be and you are always as bad as the worst case... -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband
Attachment:
signature.asc
Description: This is a digitally signed message part