Re: Hot-replace for RAID5

Patrik Horník <patrik@xxxxxx> · Wed, 16 May 2012 07:51:19 +0200

On Wed, May 16, 2012 at 12:47 AM, NeilBrown <neilb@xxxxxxx> wrote:
> On Tue, 15 May 2012 21:39:10 +0200 Patrik Horník <patrik@xxxxxx> wrote:
>
>> BTW thank you very much for the fix for layout=preserve. As soon as
>> current reshape finishes, I am going to other arrays.
>>
>> Are regressions in 2.3.4 serious and so to which version I should
>> apply the patch? Or when you looked at the code, should
>> layout=left-symmetric-6 work in 2.3.2?
>
> Regression isn't dangerous, just inconvenient (--add often doesn't work).
> --layout=left-symmetric-6 will work on 2.3.2, providing the current layout
> of the array is "left symmetric" which I think is the default, but you should
> check.

OK, thanks. Yes, the layout of my arrays is left-symmetric.

>
> NeilBrown
>
>>
>> In regard reshaping speed, estimation when doing things a lot more
>> sequentially gives much higher speeds. Lets say 48 MB backup, 6 drives
>> with 80 MB/s sequential speed. If you do reshaping like this:
>> - Read 8 MB sequential from each drive in parallel, 0.1 s
>> - Then write it to backup, 48/80 = 0.6 s
>> - Calculate Q for something like 48 MB (guessing 0.05 s) and writing
>> it back to diff drives in parallel in 0.1 s. Because it is in the
>> cache and you are only writing  in this phase (?), there is not back
>> and forth seeking and rotational latency applies only couple of times
>> altogether, lets say 0.02.
>> - Update superblock and move header back, two worst seeks, 0.03 s (I
>> dont know how often do you update superblocks?)
>>
>> you process 8 MB in cca 0.9 s, so speed in this scenario should be cca 9 MB/s.
>>
>> I guess the main real difference when you logically doing it in
>> stripes can be that when you waiting for completion of writing chunks
>> (are you waiting for real completion of writes?), the difference
>> between first and last drive is often long enough to need wait one or
>> more rotations for writing another stripe. If that is the case, you
>> need add cca 128 * lets say 1.5 * 0.005 s = 0.64 s and so we are down
>> to cca 4.3 MB/s theoretically.
>>
>> Patrik
>>
>> On Tue, May 15, 2012 at 2:13 PM, NeilBrown <neilb@xxxxxxx> wrote:
>> > On Tue, 15 May 2012 13:56:58 +0200 Patrik Horník <patrik@xxxxxx> wrote:
>> >
>> >> Anyway increasing it to 5K did not help and drives don't seem to be
>> >> fully utilized.
>> >>
>> >> Does the reshape work something like this:
>> >> - Read about X = (50M / N - 1 / stripe size) stripes from drives and
>> >> write them to the backup-file
>> >> - Reshape X stripes one by another sequentially
>> >> - Reshaping stripe by reading chunks from all drives, calculate Q,
>> >> writing all chunks back and doing I/O for next stripe only after
>> >> finishing previous one?
>> >>
>> >> So after increasing stripe_cache_size the cache should hold stripes
>> >> after backing them and so reshaping should not need to read them from
>> >> drives again?
>> >>
>> >> Cant the slow speed be caused by some synchronization issues? How are
>> >> the stripes read for writing them to backup-file? Is it done one by
>> >> one, so I/Os for next stripe are issued only after having read the
>> >> previous stripe completely? Are they issued in maximum parallel way
>> >> possible?
>> >
>> > There is as much parallelism as I could manage.
>> > The backup file is divided into 2 sections.
>> > Write to one,  then the other, then invalidate the first and write to it etc.
>> > So while one half is being written, the data in the other half is being
>> > reshaped in the array.
>> > Also the stripe reads are scheduled asynchronously and as soon as a stripe is
>> > fully available, the Q is calculated and they are scheduled for write.
>> >
>> > The slowness is due to continually having to seek back a little way to over
>> > write what has just be read, and also having to update the metadata each time
>> > to record where we are up to.
>> >
>> > NeilBrown
>> >
>> >
>> >>
>> >> Patrik
>> >>
>> >>
>> >> On Tue, May 15, 2012 at 1:28 PM, NeilBrown <neilb@xxxxxxx> wrote:
>> >> > On Tue, 15 May 2012 13:16:42 +0200 Patrik Horník <patrik@xxxxxx> wrote:
>> >> >
>> >> >> Can I increase it during reshape by echo N >
>> >> >> /sys/block/mdX/md/stripe_cache_size?
>> >> >
>> >> > Yes.
>> >> >
>> >> >
>> >> >>
>> >> >> How is the size determined? I have only 1027 while having 8 GB system memory...
>> >> >
>> >> > Not very well.
>> >> >
>> >> > It is set to 256, or the minimum size needed to allow the reshape to proceed
>> >> > (which means about 4 chunks worth).  I should probably add some auto-sizing
>> >> > but that sort of stuff is hard :-(
>> >> >
>> >> > NeilBrown
>> >> >
>> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html