On Tue, 15 May 2012 12:11:28 +0200 Patrik Horník <patrik@xxxxxx> wrote: > Neil, > > did you have a chance to look at how to migrate from raid5 to raid6 > without reshaping and/or why layout=preserve did not work? Yes. http://neil.brown.name/git?p=mdadm;a=commitdiff;h=385167f364122c9424aa3d56f00b8c0874ce78b8 fixes it. --layout=preserve works properly after that patch. > > Regarding failing drive during reshape I was worried, because I found > some mentions of problems in mailing lists from 1-2 years ago, like > non-functional backup-file after failing drive or worse... But I > tested it on test array and it worked, so I did it. testing == good !! > > Now I am getting constant speed 2.3 MB/s. Is it not too slow? It is > not CPU constrained, it is I/O. But nothing else is going on the > drives, they are all modern drives, backup is now on different drive, > so if it is enough sequential it should be much higher. What should be > the pattern of I/O operations it uses? It is 7 x HDD RAID5 to RAID6 > migration, chunk size is 64K, backup file is about 50M. Yes, it is painfully slow. It reads from the array and writes to the backup. Then allows reshape to progress which might read from the array again, and writes to the array. It is doing this in 50M blocks How big is the stripe cache - /sys/block/md0/md/stripe_cache_size ?? To hold 50M it needs 50M/4K/6 == 2133 entries. And it might need to hold it twice - once for the old layout and once for the new. So try increasing it to about 5000 if it isn't there already. That might reduce the reads and allow it to flow more smoothly. NeilBrown > > Thanks. > > Patrik > > On Mon, May 14, 2012 at 2:52 AM, Patrik Horník <patrik@xxxxxx> wrote: > > Well, > > > > I used raid=noautodetect and the other arrays did start automatically. > > I am not sure who started them, maybe initscripts... But the one > > which is reshaping thankfully did not start. > > > > Unfortunately the speed is not much better. The top speed is up by > > cca third to maybe 2.3 MB/s, which seems pretty small and I am unable > > to quickly pinpoint the exact reason.Do you have idea what can it be > > and how to improve speed? > > > > In addition the performance problem with bad drive periodically kicks > > in sooner and thus the average speed is almost the same, around 0.8 to > > 0.9 MB/s. I am thinking about failing the problematic drive. Except > > that I will end up without redundancy for yet not reshaped part, > > should the failing work as expected even in the situation array is > > now? (raid6 with 8 drives, 7 active devices in not yet reshaped part, > > couple of times stopped and start with backup-file.) > > > > Thanks. > > > > Patrik > > > > On Mon, May 14, 2012 at 12:15 AM, NeilBrown <neilb@xxxxxxx> wrote: > >> On Sun, 13 May 2012 23:41:35 +0200 Patrik Horník <patrik@xxxxxx> wrote: > >> > >>> Hi Neil, > >>> > >>> I decided to move backup file on other device. I stopped the array, > >>> mdadm stopped it but wrote "mdadm: failed to unfreeze array". What > >>> does it exactly mean? I dont want to proceed until I am sure it does > >>> not signalize error. > >> > >> That would appear to be a minor bug in mdadm - I've made a note. > >> > >> When reshaping an array like this, the 'mdadm' which started the reshape > >> forks and continues in the background managing the the backup file. > >> When it exits, having completed, it makes sure that the array is 'unfrozen' > >> just to be safe. > >> However if it exits because the array was stopped, there is no array to > >> unfreeze an it gets a little confused. > >> So it is a bug but it does not affect the data on the devices or indicate > >> that anything serious went wrong when stopping the array. > >> > >>> > >>> I quickly checked sources and it seems to be related to some sysfs > >>> resources, but I am not sure. But the array disappeared from > >>> /sys/block/. > >> > >> Exactly. And as the array disappeared, it really has stopped. > >> > >> > >>> > >>> Thanks. > >>> > >>> Patrik > >>> > >>> On Sun, May 13, 2012 at 9:43 AM, Patrik Horník <patrik@xxxxxx> wrote: > >>> > Hi Neil, > >>> > > >>> > On Sun, May 13, 2012 at 1:19 AM, NeilBrown <neilb@xxxxxxx> wrote: > >>> >> On Sat, 12 May 2012 17:56:04 +0200 Patrik Horník <patrik@xxxxxx> wrote: > >>> >> > >>> >>> Neil, > >>> >> > >>> >> Hi Patrik, > >>> >> sorry about the "--layout=preserve" confusion. I was a bit hasty. > >>> >> -layout=left-symmetric-6" would probably have done what was wanted, but it > >>> >> is a bit later for that :-( > >>> > > >>> > --layout=preserve is mentioned also in the md or mdadm > >>> > documentation... So is it not the right one? > >> > >> It should be ... I think. But it definitely seems not to work. I only have > >> a vague memory of how it was meant to work so I'll have to review the code > >> and add some proper self-tests. > >> > >>> > > >>> >>> > >>> >>> so I further analyzed the behaviour and I found following: > >>> >>> > >>> >>> - The bottleneck cca 1.7 MB/s is probably caused by backup file on one > >>> >>> of the drives, that drive is utilized almost 80% according to iostat > >>> >>> -x and its avg queue length is almost 4 while having await under 50 > >>> >>> ms. > >>> >>> > >>> >>> - The variable speed and low speeds down to 100 KB are caused by > >>> >>> problems on drive I suspected as problematic. Its service time is > >>> >>> sometimes going above 1 sec.. Total avg speed is about 0.8 MB/s. (I > >>> >>> tested the read speed on it by running check of array and it worked > >>> >>> with 30 MB/s. And because preserve should only read from it I did not > >>> >>> specifically test its write speed ) > >>> >>> > >>> >>> So my questions are: > >>> >>> > >>> >>> - Is there a way I can move backup_file to other drive 100% safely? To > >>> >>> add another non-network drive I need to restart the server. I can boot > >>> >>> it then to some live distribution for example to 100% prevent > >>> >>> automatic assembly. I think speed should be couple of times higher. > >>> >> > >>> >> Yes. > >>> >> If you stop the array, then copy the backup file, then re-assemble the > >>> >> array giving it the backup file in the new location, all should be well. > >>> >> A reboot while the array is stopped is not a problem. > >>> > > >>> > Should or will? :) I have 0.90, now 0.91, metadata, is everything > >>> > needed stored there? Should mdadm 3.2.2-1~bpo60+2 from > >>> > squeeze-backports work well? Or should I compile mdadm 3.2.4? > >> > >> "Will" requires clairvoyance :-) > >> 0.91 is the same as 0.90, except that the array is in the middle of a reshape. > >> This make sure that old kernels which don't know about reshape never try to > >> start the array. > >> Yes - everything you need is stored in the 0.91 metadata and the backup file. > >> After a clean shutdown, you could manage without the backup file if you had > >> to, but as you have it, that isn't an issue. > >> > >>> > > >>> > In case there is some risk involved I will need to choose between > >>> > waiting and risking power outage happening sometimes in the following > >>> > week (we have something like storm season here) and risking this... > >> > >> There is always risk. > >> I think you made a wise choice in choosing the move the backup file. > >> > >>> > > >>> > Do you recommend some live linux distro installable on USB which is > >>> > good for this? (One that has newest versions and dont try assemble > >>> > arrays.) > >> > >> No. Best to use whatever you are familiar with. > >> > >> > >>> > > >>> > Or will automatic assemble fail and it will cause no problem at all > >>> > for sure? (According to md or mdadm doc this should be the case.) In > >>> > that case can I use distribution on the server, Debian stable plus > >>> > some packages from squeeze, for that? Possibly with added > >>> > raid=noautodetect? I have LVM on top of raid arrays and I dont want to > >>> > cause mess. OS is not on LVM or raid. > >>> > > >> > >> raid=noautodetect is certainly a good idea. I'm not sure if the in-kernel > >> autodetect will try to start a reshaping raid - I hope not. > >> > >>> >>> > >>> >>> - Is it safe to fail and remove problematic drive? The array will be > >>> >>> down to 6 from 8 drives in part where it is not reshaped. It should > >>> >>> double the speed. > >>> >> > >>> >> As safe as it ever is to fail a device in a non-degraded array. > >>> >> i.e. it would not cause a problem directly but of course if you get an error > >>> >> on another device, that would be awkward. > >>> > > >>> > I actually "check"-ed this raid array couple of times few days ago and > >>> > data on other drives were OK. Problematic drive reported couple of > >>> > reading errors, always corrected with data from other drives and by > >>> > rewriting. > >> > >> That is good! > >> > >>> > > >>> > About that, shoud this reshaping work OK if it encounter possible > >>> > reading errors on problematic drive? Will it use data from other > >>> > drives to correct that also in this reshaping mode? > >> > >> As long as there are enough working drives to be able to read and write the > >> data, the reshape will continue. > >> > >> NeilBrown > >> > >> > >>> > > >>> > Thanks. > >>> > > >>> > Patrik > >>> > > >>> >>> > >>> >>> - Why mdadm did ignore layout=preserve? I have other arrays in that > >>> >>> server in which I need replace the drive. > >>> >> > >>> >> I'm not 100% sure - what version of mdadm are you using? > >>> >> If it is 3.2.4, then maybe commit 0073a6e189c41c broke something. > >>> >> I'll add test for this to the test suit to make sure it doesn't break again. > >>> >> But you are using 3.2.2 .... Not sure. I'd have to look more closely. > >>> >> > >>> >> Using --layout=left-symmetric-6 should work, though testing on some > >>> >> /dev/loop devices first is always a good idea. > >>> >> > >>> >> NeilBrown > >>> >> > >>> >> > >>
Attachment:
signature.asc
Description: PGP signature