RE: SCSI Problem 2.4.19-pre2-ac3

Neil Brown <neilb@cse.unsw.edu.au> · Tue, 12 Mar 2002 09:41:37 +1100 (EST)

On Monday March 11, fli@sonera.se wrote:
> 
> <snip>
> 
> > 11 Seagate 180GB disks in software RAID5 config  (note that 
> > "Array Size" below looks suspicious -- overflowing?.  also -- 
> > why 13 disks when only 12 requested?)
> 
> <snip>
> 
> > Build with: mdadm --create /dev/md6 --chunk=128 --level=5 --raid-disks=12 
> > --spare-disks=0 /dev/sd[d-o]1
> > /dev/md6:
> >         Version : 00.90.00
> >   Creation Time : Sat Mar  9 14:34:17 2002
> >      Raid Level : raid5
> >      Array Size : -197258624
> >     Device Size : 177293184 (169 GiB)
> >      Raid Disks : 12
> >     Total Disks : 13
> > Preferred Minor : 6
> >     Persistance : Superblock is persistant
> > 
> >     Update Time : Sat Mar  9 14:34:17 2002
> >           State : dirty, no-errors
> >   Active Drives : 11
> >  Working Drives : 12
> >   Failed Drives : 1
> >    Spare Drives : 1
> 
> I've seen this too. It has one spare drive even though you specify
> --spare-disks=0 on the commend line. That seems to be where the
> extra disk comes from. In my case the disk layout looked kinda weird
> when issuing "mdadm --detail /dev/mdX". That was the reson for the
> one failed drive, atleast in my case.

When you build a raid5 array with mdadm (and don't use --force) it
will make one of the devices that you specified a spare, and leave one
slot in the real array empty.
This way, correct parity is established by do a reconstruction rather
than a parity resync.
On drives that have unknown data, a reconstruction can be expected to
be much faster.
This is because all the drives will "good" data are read sequentially,
and the one drive that is being reconstructed is written sequentially,
allowing all devices to stream (if the CPU can keep up with the XOR).

The way parity resync works is that it reads all drives in parallel
and checks parity, when it finds a parity block that is wrong, it
re-writes it.  The means that if most parity blocks are write, then
all drives stream: minimal backward seeking.  But if lots of parity
blocks are wrong, you will get lots of seeking as parity blocks are
re-written and hence much slower throughput.

So for a new array, reconstruct is much better than resync, so mdadm
tries to encourage this method.

Thus when you create a 12 disk array, mdadm actually creates a 13 disk
array with one "missing" (aka failed) and one spare.
After the reconstruction completes, the spare and the missing drive
are swapped, but the missing drive is still counted in the totals.

If you want to create an array and use parity resync, then you can
just add "--force" to the above mdadm line.  If you have drives that
were already in a raid5 array, it will go at the same speed, but if
you have drives that aren't (e.g. corrupt one of your drives and try
again) you will find that it is much slower.

The counting of drives in "md" really should be tidied up, but I would
rather wait until I have factored out all the built-in knowledge of a
specific superblock format so that I can easily introduce a new (less
redundant) superblock layout.

I hope this makes things a little less opaque.

NeilBrown

> 
> Regards,
>   Fredrik Lindgren
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html