On Saturday January 21, akropel1@xxxxxxxxxxxxxxxx wrote: > NeilBrown <neilb@xxxxxxx> wrote: > > In line with the principle of "release early", following are 5 patches > > against md in 2.6.latest which implement reshaping of a raid5 array. > > By this I mean adding 1 or more drives to the array and then re-laying > > out all of the data. > > I've been looking forward to a feature like this, so I took the > opportunity to set up a vmware session and give the patches a try. I > encountered both success and failure, and here are the details of both. > > On the first try I neglected to read the directions and increased the > number of devices first (which worked) and then attempted to add the > physical device (which didn't work; at least not the way I intended). > The result was an array of size 4, operating in degraded mode, with > three active drives and one spare. I was unable to find a way to coax > mdadm into adding the 4th drive as an active device instead of a > spare. I'm not an mdadm guru, so there may be a method I overlooked. > Here's what I did, interspersed with trimmed /proc/mdstat output: Thanks, this is exactly the sort of feedback I was hoping for - people testing thing that I didn't think to... > > mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc > > md0 : active raid5 sda[0] sdc[2] sdb[1] > 2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] > > mdadm --grow -n4 /dev/md0 > > md0 : active raid5 sda[0] sdc[2] sdb[1] > 3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] I assume that no "resync" started at this point? It should have done. > > mdadm --manage --add /dev/md0 /dev/sdd > > md0 : active raid5 sdd[3](S) sda[0] sdc[2] sdb[1] > 3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] > > mdadm --misc --stop /dev/md0 > mdadm --assemble /dev/md0 /dev/sda /dev/sdb /dev/sdc /dev/sdd > > md0 : active raid5 sdd[3](S) sda[0] sdc[2] sdb[1] > 3145536 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] This really should have started a recovery.... I'll look into that too. > > For my second try I actually read the directions and things went much > better, aside from a possible /proc/mdstat glitch shown below. > > mdadm --create -l5 -n3 /dev/md0 /dev/sda /dev/sdb /dev/sdc > > md0 : active raid5 sda[0] sdc[2] sdb[1] > 2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] > > mdadm --manage --add /dev/md0 /dev/sdd > > md0 : active raid5 sdd[3](S) sdc[2] sdb[1] sda[0] > 2097024 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] > > mdadm --grow -n4 /dev/md0 > > md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0] > 2097024 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] > ...should this be... --> [4/3] [UUU_] perhaps? Well, part of the array is "4/4 UUUU" and part is "3/3 UUU". How do you represent that? I think "4/4 UUUU" is best. > [>....................] recovery = 0.4% (5636/1048512) finish=9.1min speed=1878K/sec > > [...time passes...] > > md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0] > 3145536 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] > > My final test was a repeat of #2, but with data actively being written > to the array during the reshape (the previous tests were on an idle, > unmounted array). This one failed pretty hard, with several processes > ending up in the D state. I repeated it twice and sysrq-t dumps can be > found at <http://www.kroptech.com/~adk0212/md-raid5-reshape-wedge.txt>. > The writeout load was a kernel tree untar started shortly before the > 'mdadm --grow' command was given. mdadm hung, as did tar. Any process > which subsequently attmpted to access the array hung as well. A second > attempt at the same thing hung similarly, although only pdflush shows up > hung in that trace. mdadm and tar are missing for some reason. Hmmm... I tried similar things but didn't get this deadlock. Somehow the fact that mdadm is holding the reconfig_sem semaphore means that some IO cannot proceed and so mdadm cannot grab and resize all the stripe heads... I'll have to look more deeply into this. > > I'm happy to do more tests. It's easy to conjur up virtual disks and > load them with irrelevant data (like kernel trees ;) Great. I'll probably be putting out a new patch set late this week or early next. Hopefully it will fix the issues you can found and you can try it again.. Thanks again, NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html