Re: Maybe crazy idea for reshaping: Instant/On-Demand reshaping

Neil Brown <neilb@xxxxxxx> · Sat, 20 Feb 2010 19:40:20 +1100

On Sat, 20 Feb 2010 06:27:54 +0100
Goswin von Brederlow <goswin-v-b@xxxxxx> wrote:

> Hi,
> 
> last night I started a reshape of a raid5 array. Now I got up and it is
> still over 15 hours till the reshape is done. That got me thinking.
> Since I haven't had my coffee yet let me use you as a sounding board.
> 
> 
> 1) Wouldn't it be great if during reshape the size of the raid would
> gradualy increase?
> 
> As the reshape progresses and data moves from X to X+1 disks it creates
> free space. So why can't one increase the device size gradually to
> include that space?
> 
> Unfortunately the space it frees is needed to reshape the later
> stripes. As it reshapes a window of free space moves from the start of
> the disks to the end of the disks. For the device size to grow the place
> where the new stripe would land after reshaping needs to be free and
> that means the window of free space must have moved far enough to
> include that place. That means X/(X+1) of the data has already been
> copied. Only while copying the last 1/(X+1) of data could the size
> increase. That would still be a plus.
> 
> Note: After all the data has been copied, when the window of free space
> has reached the end of the disks, there is still work to do. The window
> of free space contains random data and needs to be resynced or zeroed so
> the parity of the future stripes is correct. For the size to increase
> gradually that resync/zeroing would have to be interleaved with copying
> the remaining data.
> 
> 
> 2) With the existing reshape a gradual size increase is impossible
> untill late in the reshape. Could we do better?
> 
> The problem with increasing the size before the reshape is done is that
> there is existing data where our new free space is going to be. Maybe we
> could move the data away as needed. Whenever something writes to a new
> stripe that still contains old data we move the old stripe to its new
> place. That would require information where old data is. Something like
> a bitmap. We might not get stripe granularity but that is ok.
> 
> It gets bit more complex. Moving a chunk of old data means writing data
> to new stripes. They can contain old data as well requiring a recursion.
> But old data always gets copied to lower blocks. Assuming we finished
> some reshaping at the start of the disks (at least some critical section
> must be done) then eventually we hit a region that was already reshaped.
> As the reshape progresses it will take less and less recursions.
> 
> Note: reads from stripes with old data would return all 0.
> 
> Note 2: writing to a stripe can write to the old stripe if that wasn't
> respahed yet.
> 
> Note 3: there would still be a normal reshape process that goes from
> start to end on each disk, it would just run in parallel with the
> on-demand copying.
> 
> Writing to a new stripe that hasn't yet been reshaped will be horribly
> slow at first and gradually become faster as the reshape progresses.
> Also as more new stripes get written there will be more and more chunks
> in the middle of the disks that have been reshaped so the recursion will
> not have to go to the fully reshaped region at the start of the disk
> every time.
> 
> 
> So what do you think? Is this lack of cafein speaking?

I think your cost-benefit trade-off is way off balance on the cost side.

This sort of complexity really belongs in a filesystem, not in a a block
device.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html