On Mon, 2006-10-09 at 15:10 -0400, Rob Bray wrote: > > On Mon, 2006-10-09 at 15:49 +0200, Erik Mouw wrote: > > > >> There is no way to figure out what exactly is correct data and what is > >> not. It might work right after creation and during the initial install, > >> but after the next reboot there is no way to figure out what blocks to > >> believe. > > > > You don't really need to. After a clean install, the operating system > > has no business reading any block it didn't write to during the install > > unless you are just reading disk blocks for the fun of it. And any > > program that depends on data that hasn't first been written to disk is > > just wrong and stupid anyway. > > I suppose a partial-stripe write would read back junk data on the other > disks, xor with your write, and update the parity block. The original email was about raid1 and the fact that reads from different disks could return different data. For that scenario, my comments are accurate. For the parity based raids, you never have two disks with the same block, so you would only ever get different results if you had a disk fail and the parity was never initialized. For that situation, you would need to init the parity on any stripe that has been even partially written to. Totally unwritten stripes could have any parity you want since the data is undefined anyway, so who cares if it changes when a disk fails and you are reconstructing from parity. > If you benchmark the disk, you're going to be reading blocks you didn't > necessarily write, which could kick out consistency errors. The only benchmarks I know of that give a rats ass about the data integrity are ones that write a pattern first and then read it back. In that case, parity would have been init'ed during the write. > A whole-array consistency check would puke on the out-of-whack parity data. Or a whole array consistency check on an array that hasn't had a whole array parity init makes no sense. You could create the array without touching the parity, update parity on all stripes that are written, leave a flag in the superblock indicating the array has never been init'ed, and in the event of failure you can use the parity safe in the knowledge that all stripes that have been written to have valid parity and all other stripes we don't care about. The main problem here is that if we *did* need a consistency check, we couldn't tell errors from uninit'ed stripes. You could also make it so that the first time you run a consistency check with the uninit'ed flag in the superblock set, you calculate all parity and then clear the flag in the superblock and on all subsequent runs you would then know when you have an error as opposed to an uninit'ed block. Probably the best thing to do would be on create of the array, setup a large all 0 block of mem and repeatedly write that to all blocks in the array devices except parity blocks and use a large all 1 block for that. Then you could just write the entire array at blinding speed. You could call that the "quick-init" option or something. You wouldn't be able to use the array until it was done, but it would be quick. If you wanted to be *really* fast, at least for SCSI drives you could write one large chunk of 0's and one large chunk of 1's at the first parity block, then use the SCSI COPY command to copy the 0 chunk everywhere it needs to go, and likewise for the parity chunk, and avoid transferring the data over the SCSI bus more than once. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband
Attachment:
signature.asc
Description: This is a digitally signed message part