Re: [patch 3/4] fastboot: make the raid autodetect code wait for all devices to init

Neil Brown <neilb@xxxxxxx> · Tue, 22 Jul 2008 09:24:36 +1000

On Monday July 21, madduck@xxxxxxxxxxx wrote:
> also sprach Neil Brown <neilb@xxxxxxx> [2008.07.21.0106 +0200]:
> > The "real" solution here involves assembling arrays in userspace using
> > "mdadm --incremental" from udevd, and using write-intent-bitmaps so
> > that writing to an array before all the component devices are
> > available can be done without requiring a full resync.  There is still
> > a bit more code needed to make that work really smoothly.
> 
> It was my understanding that write-intent bitmaps slow down all
> operations and are not suggested on e.g. workstations. No?

Well, they don't slow down reads.
If you have a separate root filesystem (i.e. /home and /var are
elsewhere), it is likely to be read-mostly, so bitmaps probably won't
hurt much.
And an external bitmap on a dedicated device has minimal performance
cost.

However I neither suggest having nor not-having bitmaps.  The choice to
use them involves a trade-off which I cannot make for other people.

They would, however, be very useful to cover the gap when assembling
arrays incrementally.
If, for example, you have a 6 disk raid5 array and 5 disks have been
found, what do you do?  
  - wait for the 6th, that might never arrive
  - start degraded and if a write happens before the 6th disk arrives,
    have to rebuild the 6th disk completely.

Neither is a good option.
An alternate is
  - add an internal bitmap, and remove it after the 6th disk has
    arrived, or after we are sure there are no more disks to find.

Doing this means that if a recovery is needed when the 6th disk
arrives, it will be very fast.

It's not hard to notice that the bitmap proposed here does not need to
be on stable storage.  It is not protecting against a crash, just
against a window when the array is degraded.  So if we could support
bitmaps on a tmpfs, we could use an external bitmap in /tmp instead of
an internal bitmap.
Or even - we could enhance the md code to always use a bitmap, but
simply not write it to storage if no such was configured.

(If a crash happens during that window between writing to the degraded
array and recovering the few blocks needed on the final device, then
you would be in an unfortunate position.  For raid1/10 you would just
need a full resync, which you would have needed anyway, so no loss.
For raid4/5/6, you have a potential for dataloss, so I probably would
not make this behaviour the default for those levels...)

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html