Maybe crazy idea for reshaping: Instant/On-Demand reshaping

Goswin von Brederlow <goswin-v-b@xxxxxx> · Sat, 20 Feb 2010 06:27:54 +0100

Hi,

last night I started a reshape of a raid5 array. Now I got up and it is
still over 15 hours till the reshape is done. That got me thinking.
Since I haven't had my coffee yet let me use you as a sounding board.

1) Wouldn't it be great if during reshape the size of the raid would
gradualy increase?

As the reshape progresses and data moves from X to X+1 disks it creates
free space. So why can't one increase the device size gradually to
include that space?

Unfortunately the space it frees is needed to reshape the later
stripes. As it reshapes a window of free space moves from the start of
the disks to the end of the disks. For the device size to grow the place
where the new stripe would land after reshaping needs to be free and
that means the window of free space must have moved far enough to
include that place. That means X/(X+1) of the data has already been
copied. Only while copying the last 1/(X+1) of data could the size
increase. That would still be a plus.

Note: After all the data has been copied, when the window of free space
has reached the end of the disks, there is still work to do. The window
of free space contains random data and needs to be resynced or zeroed so
the parity of the future stripes is correct. For the size to increase
gradually that resync/zeroing would have to be interleaved with copying
the remaining data.

2) With the existing reshape a gradual size increase is impossible
untill late in the reshape. Could we do better?

The problem with increasing the size before the reshape is done is that
there is existing data where our new free space is going to be. Maybe we
could move the data away as needed. Whenever something writes to a new
stripe that still contains old data we move the old stripe to its new
place. That would require information where old data is. Something like
a bitmap. We might not get stripe granularity but that is ok.

It gets bit more complex. Moving a chunk of old data means writing data
to new stripes. They can contain old data as well requiring a recursion.
But old data always gets copied to lower blocks. Assuming we finished
some reshaping at the start of the disks (at least some critical section
must be done) then eventually we hit a region that was already reshaped.
As the reshape progresses it will take less and less recursions.

Note: reads from stripes with old data would return all 0.

Note 2: writing to a stripe can write to the old stripe if that wasn't
respahed yet.

Note 3: there would still be a normal reshape process that goes from
start to end on each disk, it would just run in parallel with the
on-demand copying.

Writing to a new stripe that hasn't yet been reshaped will be horribly
slow at first and gradually become faster as the reshape progresses.
Also as more new stripes get written there will be more and more chunks
in the middle of the disks that have been reshaped so the recursion will
not have to go to the fully reshaped region at the start of the disk
every time.

So what do you think? Is this lack of cafein speaking?

MfG
        Goswin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html