Neil, so I further analyzed the behaviour and I found following: - The bottleneck cca 1.7 MB/s is probably caused by backup file on one of the drives, that drive is utilized almost 80% according to iostat -x and its avg queue length is almost 4 while having await under 50 ms. - The variable speed and low speeds down to 100 KB are caused by problems on drive I suspected as problematic. Its service time is sometimes going above 1 sec.. Total avg speed is about 0.8 MB/s. (I tested the read speed on it by running check of array and it worked with 30 MB/s. And because preserve should only read from it I did not specifically test its write speed ) So my questions are: - Is there a way I can move backup_file to other drive 100% safely? To add another non-network drive I need to restart the server. I can boot it then to some live distribution for example to 100% prevent automatic assembly. I think speed should be couple of times higher. - Is it safe to fail and remove problematic drive? The array will be down to 6 from 8 drives in part where it is not reshaped. It should double the speed. - Why mdadm did ignore layout=preserve? I have other arrays in that server in which I need replace the drive. Thanks. Patrik On Sat, May 12, 2012 at 6:40 AM, Patrik Horník <patrik@xxxxxx> wrote: > Neil, the migration to RAID6 is unfortunately not working as expected. > > I added spare and used command mdadm --grow /dev/md6 --level 6 > --layout=preserve, but I guess it ignored layout preserve. > > It asked for backup_file and now it is writing the same amount of data > on all drives. I maybe can live with that, even if that is little > risky because I suspect one of the drives is not OK. But the problem > is I thought backup_file is only for some critical section, so I gave > it backup_file located on one of the drives used in the array. It is > of course not on a partition in the array, but it seems it is the I/O > bottleneck. The speed of reshaping is not constant and varies between > 100 K/s and 1.6 MB/s and it seems it will take more than a week maybe > two. > > It is kernel 3.2.0 amd64 and mdadm 3.2.2 from squezee backports, it > was seven and now it is eight drives. > > What additional info you need to diagnose the problem? I am not yet > 100% sure the botlleneck is backup file, but it looks like it from > iostat -d. Is there anything I can do about that? (Like stoping the > reshaping and changing the backup file. To do that I need to restart > server and I need the operation was 100% safe.) > > Here is output of detail: > > Version : 0.91 > Creation Time : Tue Aug 18 14:51:41 2009 > Raid Level : raid6 > Array Size : 2933388288 (2797.50 GiB 3003.79 GB) > Used Dev Size : 488898048 (466.25 GiB 500.63 GB) > Raid Devices : 8 > Total Devices : 8 > Preferred Minor : 6 > Persistence : Superblock is persistent > > Update Time : Sat May 12 06:37:48 2012 > State : clean, degraded, reshaping > Active Devices : 7 > Working Devices : 8 > Failed Devices : 0 > Spare Devices : 1 > > Layout : left-symmetric-6 > Chunk Size : 64K > > Reshape Status : 0% complete > New Layout : left-symmetric > > UUID : d8e679a2:5d6fa7a7:2e406ee4:439be8d3 > Events : 0.983549 > > Number Major Minor RaidDevice State > 0 8 115 0 active sync /dev/sdh3 > 1 8 67 1 active sync /dev/sde3 > 2 8 99 2 active sync /dev/sdg3 > 3 8 83 3 active sync /dev/sdf3 > 4 8 3 4 active sync /dev/sda3 > 5 8 19 5 active sync /dev/sdb3 > 6 8 35 6 active sync /dev/sdc3 > 7 8 51 7 spare rebuilding /dev/sdd3 > > > Patrik > > > On Fri, May 11, 2012 at 9:16 AM, David Brown <david.brown@xxxxxxxxxxxx> wrote: >> Just in case you missed it earlier... >> >> Remember to take a backup before you start this! >> >> Also make notes of things like the "mdadm --detail", version numbers, the >> exact commands executed, etc. (and store this information on another >> computer!) If something does go wrong, then that information can make it >> much easier for Neil or others to advise you. >> >> mvh., >> >> David >> >> >> >> On 11/05/2012 04:44, Patrik Horník wrote: >>> >>> On Fri, May 11, 2012 at 2:50 AM, NeilBrown<neilb@xxxxxxx> wrote: >>>> >>>> On Thu, 10 May 2012 19:16:59 +0200 Patrik Horník<patrik@xxxxxx> wrote: >>>> >>>>> Neil, can you please comment if separate operations mentioned in this >>>>> process are behaving and are stable enough as we expect? Thanks. >>>> >>>> >>>> The conversion to and from RAID6 as described should work as expected, >>>> though >>>> it requires having an extra device and requires to 'recovery' cycles. >>>> Specifying the number of --raid-devices is not necessary. When you >>>> convert >>>> RAID5 to RAID6, mdadm assumes you are increasing number of devices by 1 >>>> unless you say otherwise. Similarly with RAID6->RAID5 the assumption is >>>> a >>>> decrease by 1. >>>> >>>> Doing an in-place reshape with the new 3.3 code should work, though with >>>> a >>>> softer "should" than above. We will only know that it is "stable" when >>>> enough >>>> people (such as yourself) try it and report success. If anything does go >>>> wrong I would of course help you to put the array back together but I can >>>> never guarantee no data loss. You wouldn't be the first to test the code >>>> on >>>> live data, but you would be the second that I have heard of. >>> >>> >>> Thanks Neil, this answers my questions. I dont like being second, so >>> RAID5 - RAID6 - RAID5 it is... :) >>> >>> In addition my array has 0.9 metadata so hot-replace would also >>> require conversion of metadata, so all together it seems much riskier. >>> >>>> The in-place reshape is not yet supported by mdadm but it is very easy to >>>> manage directly. Just >>>> echo replaceable> /sys/block/mdXXX/md/dev-YYY/state >>>> and as soon as a spare is available the replacement will happen. >>>> >>>> NeilBrown >>>> >>>> >>>>> >>>>> On Thu, May 10, 2012 at 8:59 AM, David Brown<david.brown@xxxxxxxxxxxx> >>>>> wrote: >>>>>> >>>>>> (I accidentally sent my first reply directly to the OP, and forgot the >>>>>> mailing list - I'm adding it back now, because I don't want the OP to >>>>>> follow >>>>>> my advice until others have confirmed or corrected it!) >>>>>> >>>>>> >>>>>> On 09/05/2012 21:53, Patrik Horník wrote: >>>>>>> >>>>>>> Great suggestion, thanks. >>>>>>> >>>>>>> So I guess steps with exact parameters should be: >>>>>>> 1, add spare S to RAID5 array >>>>>>> 2, mdadm --grow /dev/mdX --level 6 --raid-devices N+1 >>>>>>> --layout=preserve >>>>>>> 3, remove faulty drive and add replacement, let it synchronize >>>>>>> 4, possibly remove added spare S >>>>>>> 5, mdadm --grow /dev/mdX --level 5 --raid-devices N >>>>>> >>>>>> >>>>>> >>>>>> Yes, that's what I was thinking. You are missing "2b - let it >>>>>> synchronise". >>>>> >>>>> >>>>> Sure :) >>>>> >>>>>> Of course, another possibility is that if you have the space in the >>>>>> system >>>>>> for another drive, you may want to convert to a full raid6 for the >>>>>> future. >>>>>> That way you have the extra safety built-in in advance. But that will >>>>>> definitely lead to a re-shape. >>>>> >>>>> >>>>> Actually I dont have free physical space, array already has 7 drives. >>>>> For the process I need place the additional drive on table near the PC >>>>> and cool it with fan standing by itself on table... :) >>>>> >>>>>>> >>>>>>> My questions: >>>>>>> - Are you sure steps 3, 4 and 5 would not cause reshaping? >>>>>> >>>>>> >>>>>> I /believe/ it will avoid a reshape, but I can't say I'm sure. This is >>>>>> stuff that I only know about in theory, and have not tried in practice. >>>>>> >>>>>> >>>>>>> >>>>>>> - My array has now left-symmetric layout, so after migration to RAID6 >>>>>>> it should be left-symmetric-6. Is RAID6 working without problem in >>>>>>> degraded mode with this layout, no matter which one or two drives are >>>>>>> missing? >>>>>>> >>>>>> >>>>>> The layout will not affect the redundancy or the features of the raid - >>>>>> it >>>>>> will only (slightly) affect the speed of some operations. >>>>> >>>>> >>>>> I know it should work, but it is probably configuration that is not >>>>> used much by users, so maybe it is not tested as much as standard >>>>> layouts. So the question was aiming more at practical experience and >>>>> stability... >>>>> >>>>>>> - What happens in step 5 and how long does it take? (If it is without >>>>>>> reshaping, it should only upgrade superblocks and thats it.) >>>>>> >>>>>> >>>>>> That is my understanding. >>>>>> >>>>>> >>>>>>> >>>>>>> - What happens if I dont remove spare S before migration back to >>>>>>> RAID5? Will the array be reshaped and which drive will it make into >>>>>>> spare? (If step 5 is instantaneous, there is no reason for that. But >>>>>>> if it takes time, it is probably safer.) >>>>>>> >>>>>> >>>>>> I /think/ that the extra disk will turn into a hot spare. But I am >>>>>> getting >>>>>> out of my depth here - it all depends on how the disks get numbered and >>>>>> how >>>>>> that affects the layout, and I don't know the details here. >>>>>> >>>>>> >>>>>>> So all and alll, what guys do you think is more reliable now, new >>>>>>> hot-replace or these steps? >>>>>> >>>>>> >>>>>> >>>>>> I too am very curious to hear opinions. Hot-replace will certainly be >>>>>> much >>>>>> simpler and faster than these sorts of re-shaping - it's exactly the >>>>>> sort of >>>>>> situation the feature was designed for. But I don't know if it is >>>>>> considered stable and well-tested, or "bleeding edge". >>>>>> >>>>>> mvh., >>>>>> >>>>>> David >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> Patrik >>>>>>> >>>>>>> On Wed, May 9, 2012 at 8:09 AM, David Brown<david.brown@xxxxxxxxxxxx> >>>>>>> wrote: >>>>>>>> >>>>>>>> On 08/05/12 11:10, Patrik Horník wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hello guys, >>>>>>>>> >>>>>>>>> I need to replace drive in big production RAID5 array and I am >>>>>>>>> thinking about using new hot-replace feature added in kernel 3.3. >>>>>>>>> >>>>>>>>> Does someone have experience with it on big RAID5 arrays? Mine is 7 >>>>>>>>> * >>>>>>>>> 1.5 TB. What do you think about its status / stability / >>>>>>>>> reliability? >>>>>>>>> Do you recommend it on production data? >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>> >>>>>>>> If you don't want to play with the "bleeding edge" features, you >>>>>>>> could >>>>>>>> add >>>>>>>> the disk and extend the array to RAID6, then remove the old drive. I >>>>>>>> think >>>>>>>> if you want to do it all without doing any re-shapes, however, then >>>>>>>> you'd >>>>>>>> need a third drive (the extra drive could easily be an external USB >>>>>>>> disk >>>>>>>> if >>>>>>>> needed - it will only be used for writing, and not for reading unless >>>>>>>> there's another disk failure). Start by adding the extra drive as a >>>>>>>> hot >>>>>>>> spare, then re-shape your raid5 to raid6 in raid5+extra parity >>>>>>>> layout. >>>>>>>> Then >>>>>>>> fail and remove the old drive. Put the new drive into the box and >>>>>>>> add it >>>>>>>> as >>>>>>>> a hot spare. It should automatically take its place in the raid5, >>>>>>>> replacing >>>>>>>> the old one. Once it has been rebuilt, you can fail and remove the >>>>>>>> extra >>>>>>>> drive, then re-shape back to raid5. >>>>>>>> >>>>>>>> If things go horribly wrong, the external drive gives you your parity >>>>>>>> protection. >>>>>>>> >>>>>>>> Of course, don't follow this plan until others here have commented on >>>>>>>> it, >>>>>>>> and either corrected or approved it. >>>>>>>> >>>>>>>> And make sure you have a good backup no matter what you decide to do. >>>>>>>> >>>>>>>> mvh., >>>>>>>> >>>>>>>> David >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>> >>> >> -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html