BTW thank you very much for the fix for layout=preserve. As soon as current reshape finishes, I am going to other arrays. Are regressions in 2.3.4 serious and so to which version I should apply the patch? Or when you looked at the code, should layout=left-symmetric-6 work in 2.3.2? In regard reshaping speed, estimation when doing things a lot more sequentially gives much higher speeds. Lets say 48 MB backup, 6 drives with 80 MB/s sequential speed. If you do reshaping like this: - Read 8 MB sequential from each drive in parallel, 0.1 s - Then write it to backup, 48/80 = 0.6 s - Calculate Q for something like 48 MB (guessing 0.05 s) and writing it back to diff drives in parallel in 0.1 s. Because it is in the cache and you are only writing in this phase (?), there is not back and forth seeking and rotational latency applies only couple of times altogether, lets say 0.02. - Update superblock and move header back, two worst seeks, 0.03 s (I dont know how often do you update superblocks?) you process 8 MB in cca 0.9 s, so speed in this scenario should be cca 9 MB/s. I guess the main real difference when you logically doing it in stripes can be that when you waiting for completion of writing chunks (are you waiting for real completion of writes?), the difference between first and last drive is often long enough to need wait one or more rotations for writing another stripe. If that is the case, you need add cca 128 * lets say 1.5 * 0.005 s = 0.64 s and so we are down to cca 4.3 MB/s theoretically. Patrik On Tue, May 15, 2012 at 2:13 PM, NeilBrown <neilb@xxxxxxx> wrote: > On Tue, 15 May 2012 13:56:58 +0200 Patrik Horník <patrik@xxxxxx> wrote: > >> Anyway increasing it to 5K did not help and drives don't seem to be >> fully utilized. >> >> Does the reshape work something like this: >> - Read about X = (50M / N - 1 / stripe size) stripes from drives and >> write them to the backup-file >> - Reshape X stripes one by another sequentially >> - Reshaping stripe by reading chunks from all drives, calculate Q, >> writing all chunks back and doing I/O for next stripe only after >> finishing previous one? >> >> So after increasing stripe_cache_size the cache should hold stripes >> after backing them and so reshaping should not need to read them from >> drives again? >> >> Cant the slow speed be caused by some synchronization issues? How are >> the stripes read for writing them to backup-file? Is it done one by >> one, so I/Os for next stripe are issued only after having read the >> previous stripe completely? Are they issued in maximum parallel way >> possible? > > There is as much parallelism as I could manage. > The backup file is divided into 2 sections. > Write to one, then the other, then invalidate the first and write to it etc. > So while one half is being written, the data in the other half is being > reshaped in the array. > Also the stripe reads are scheduled asynchronously and as soon as a stripe is > fully available, the Q is calculated and they are scheduled for write. > > The slowness is due to continually having to seek back a little way to over > write what has just be read, and also having to update the metadata each time > to record where we are up to. > > NeilBrown > > >> >> Patrik >> >> >> On Tue, May 15, 2012 at 1:28 PM, NeilBrown <neilb@xxxxxxx> wrote: >> > On Tue, 15 May 2012 13:16:42 +0200 Patrik Horník <patrik@xxxxxx> wrote: >> > >> >> Can I increase it during reshape by echo N > >> >> /sys/block/mdX/md/stripe_cache_size? >> > >> > Yes. >> > >> > >> >> >> >> How is the size determined? I have only 1027 while having 8 GB system memory... >> > >> > Not very well. >> > >> > It is set to 256, or the minimum size needed to allow the reshape to proceed >> > (which means about 4 chunks worth). I should probably add some auto-sizing >> > but that sort of stuff is hard :-( >> > >> > NeilBrown >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html