RE: RAID5 rebuild question

"Guy" <bugzilla@xxxxxxxxxxxxxxxx> · Sun, 3 Jul 2005 23:41:49 -0400

This is worth saving!!!!

I did want to create a list of frequent problems, and how to correct them,
but never made the time.  I don't know of any FAQ pages.  This mailing list
is it!  :)

Guy

> -----Original Message-----
> From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Neil Brown
> Sent: Sunday, July 03, 2005 9:21 PM
> To: Guy
> Cc: 'Christopher Smith'; linux-raid@xxxxxxxxxxxxxxx
> Subject: RE: RAID5 rebuild question
> 
> On Sunday July 3, bugzilla@xxxxxxxxxxxxxxxx wrote:
> > It looks like it is rebuilding to a spare or new disk.
> 
> Yep.
> 
> > If this is a new array, I would think that create would be writing to
> all
> > disks, but not sure.
> 
> Nope....
> 
> When creating a new raid5 array, we need to make sure the parity
> blocks are all correct (obviously).  There are several ways to do
> this.
> 
> 1/ write zeros to all drives.  This would make the array unusable
>    until the clearing is complete, so isn't a good option.
> 2/ Read all the data blocks, compute the parity block, and then write
>    out the parity block.  This works, but is not optimal.  Remembering
>    that the parity block is on a different drive for each 'stripe',
>    think about what the read/write heads are doing.
>    The heads on the 'reading' drives will be somewhere ahead of the
>    heads on the 'writing' drive.  Every time we step to a new stripe
>    and change which is the 'writing' head, the other reading heads
>    have to wait for the head that has just changes from 'writing' to
>    'reading' to catch up (finish writing, then start reading).
>    Waiting slows things down, so this is uniformly sub-optimal.
> 3/ read all data blocks and parity blocks, check the parity block to
>    see if it is correct, and only write out a new block if it wasn't.
>    This works quite well if most of the parity blocks are correct as
>    all heads are reading in parallel and are pretty-much synchronised.
>    This is how the raid5 'resync' process in md works.  It happens
>    after an unclean shutdown if the array was active at crash-time.
>    However if most or even many of the parity blocks are wrong, this
>    process will be quite slow as the parity-block drive will have to
>    read-a-bunch, step-back, write-a-bunch.  So it isn't good for
>    initially setting the parity.
> 4/ Assume that the parity blocks are all correct, but that one drive
>    is missing (i.e. the array is degraded).  This is repaired by
>    reconstructing what should have been on the missing drive, onto a
>    spare.  This involves reading all the 'good' drives in parallel,
>    calculating them missing block (whether data or parity) and writing
>    it to the 'spare' drive.  The 'spare' will be written to a few (10s
>    or 100s of) blocks behind the blocks being read off the 'good'
>    drives, but each drive will run completely sequentially and so at
>    top speed.
> 
> On a new array where most of the parity blocks are probably bad, '4'
> is clearly the best option. 'mdadm' makes sure this happens by creating
> a raid5 array not with N good drives, but with N-1 good drives and one
> spare.  Reconstruction then happens and you should see exactly what
> was reported: reads from all but the last drive, writes to that last
> drives.
> 
> This should go in a FAQ.  Is anyone actively maintaining an md/mdadm
> FAQ at the moment, or should I start putting something together??
> 
> NeilBrown
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html