Re: Setting up md-raid5: observations, errors, questions

Michael Tokarev <mjt@xxxxxxxxxx> · Mon, 03 Mar 2008 00:56:29 +0300

Christian Pernegger wrote:
[]
 First, try to disable bitmaps on the raid array

It has been pointed out recently here on linux-raid that internal bitmap
doesn't work well:

Message-ID: <47C44DDB.3050201@xxxxxxx>
Date:	Tue, 26 Feb 2008 18:35:23 +0100
From:	Hubert Verstraete <hubskml@xxxxxxx>
To:	Neil Brown <neilb@xxxxxxx>, linux-raid@xxxxxxxxxxxxxxx
Subject: internal bitmap size

Hi Neil,

Neil Brown wrote:
> For now, you will have to live with a smallish bitmap, which probably
> isn't a real problem.  With 19078 bits, you will still get a
> several-thousand-fold increase it resync speed after a crash
> (i.e. hours become seconds) and to some extent, fewer bits are better
> and you have to update them less.
>
> I've haven't made any measurements to see what size bitmap is
> ideal... maybe someone should :-)

I've made some tries with a 4 250GB disks RAID-5 array and the write
speed is really ugly with the default internal bitmap size.
Setting a bigger bitmap chunk size (16 MB for example) creates a small
bitmap. The write speed is then almost the same as when there is no
bitmap, which is great. And as you said, the resync is a matter of
seconds (or minutes) instead of hours (without bitmap).
With such a setting, I've got both a nice write speed and a nice resync
speed. That's where I would look at to find MY ideal bitmap size.
....

Maybe I did that by accident for the various vmstat data for different
RAID levels I posted previously. At least I forgot to explicitely
specify a bitmap for those tests (see above).

It's my understanding that the bitmap is a raid chunk level journal to
speed up recovery, correct? Doing that reduces the window during which
a second disk can die with catastrophic consequences -> bitmaps are a
good thing, especially on an array where a full rebuild takes hours.
Seeing as the primary purpose of the raid5 is fault tolerance I could
live with a performance penalty but why is it *that* slow?

Umm..  You mixed it all ;)

Bitmap is a place (stored somewhere... ;) where each equally-sized
block of the array has a single bit of information - namely, if that
block has been written recently (which means it was dirty) or not.
So for each block (which is in no way related to chunk size etc!)
we've an on/off switch, telling us if the said block has to be
re-syncronized if we need to perform re-syncronisation of data -
for example, in case of power loss -- only those blocks marked
"dirty" in the bitmap needs to be recalculated and rewritten,
not the whole array.

This has nothing to do with window between first and second disk
failure.  Once first disk fails, bitmap is of no use anymore,
because you will need a replacement disk, which has to be
resyncronized in whole, because it's shiny new.  Bitmap only
helps for unclean shutdown, and only if there was no recent write
activity (which hasn't been "comitted" by md layer and the array
hasn't been re-marked as clean - it happens every 0.21 sec by
default - see /sys/block/mdN/md/safe_mode_delay).

If I put the bitmap on an external drive it will be a lot faster - but
what happens, when the bitmap "goes away" (because that disk fails,
isn't accessible, etc)?
Is it goodbye array or is the worst case a full resync? How well is
the external bitmap supported?
(That same consideration kept me from using external journals for ext3.)

If the bitmap is unaccessible, it's handled as there was no bitmap
at all - ie, if the array was dirty, it will be resynced as a whole;
if it was clean, nothing will be done.  Bitmap gives a set of blocks
to OMIT from resyncronisation, and if that information is unavailable...

Yes, external bitmaps are supported and working.  It doesn't mean
they're faster however - I tried placing a bitmap into a tmpfs (just
for testing) - and discovered about 95% drop in speed compared to the
case with internal bitmap (ie, only 5% speed when bitmap is on tmpfs -
bitmap size was the same).  It was long (more than a year) ago so things
may have changed already.

I highly doubt chunk size makes any difference.  Bitmap is the primary
suspect here.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html