Hello, On Mon, 25 Jun 2012 15:06:51 +0900 Christian Balzer wrote: > > Hello Neil, > > On Mon, 25 Jun 2012 14:07:54 +1000 NeilBrown wrote: > > > On Fri, 22 Jun 2012 17:42:57 +0900 Christian Balzer <chibi@xxxxxxx> > > wrote: > > > > > > > > Hello, > > > > > > On Fri, 22 Jun 2012 18:07:48 +1000 NeilBrown wrote: > > > > > > > On Fri, 22 Jun 2012 16:06:32 +0900 Christian Balzer <chibi@xxxxxxx> > > > > wrote: > > > > > > > > > > > > > > Hello, > > > > > > > > > > the basics first: > > > > > Debian Squeeze, custom 3.2.18 kernel. > > > > > > > > > > The Raid(s) in question are: > > > > > --- > > > > > Personalities : [raid1] [raid10] > > > > > md4 : active raid10 sdd1[0] sdb4[5](S) sdl1[4] sdk1[3] sdj1[2] > > > > > sdi1[1] 3662836224 blocks super 1.2 512K chunks 2 near-copies > > > > > [5/5] [UUUUU] > > > > > > > > I'm stumped by this. It shouldn't be possible. > > > > > > > > The size of the array is impossible. > > > > > > > > If there are N chunks per device, then there are 5*N chunks on the > > > > whole array, and there are are two copies of each data chunk, so > > > > 5*N/2 distinct data chunks, so that should be the size of the > > > > array. > > > > > > > > So if we take the size of the array, divide by chunk size, multiply > > > > by 2, divide by 5, we get N = the number of chunks per device. > > > > i.e. > > > > N = (array_size / chunk_size)*2 / 5 > > > > > > > > If we plug in 3662836224 for the array size and 512 for the chunk > > > > size, we get 2861590.8, which is not an integer. > > > > i.e. impossible. > > > > > > > Quite right, though I never bothered to check that number of course, > > > pretty much assuming after using Linux MD since the last millennium > > > that it would get things right. ^o^ > > > > > > > What does "mdadm --examine" of the various devices show? > > > > > > > They looks all identical and sane to me: > > > --- > > > /dev/sdc1: > > > Magic : a92b4efc > > > Version : 1.2 > > > Feature Map : 0x0 > > > Array UUID : 2b46b20b:80c18c76:bcd534b5:4d1372e4 > > > Name : borg03b:3 (local to host borg03b) > > > Creation Time : Sat May 19 01:07:34 2012 > > > Raid Level : raid10 > > > Raid Devices : 5 > > > > > > Avail Dev Size : 2930269954 (1397.26 GiB 1500.30 GB) > > > Array Size : 5860538368 (2794.52 GiB 3000.60 GB) > > > Used Dev Size : 2930269184 (1397.26 GiB 1500.30 GB) > > > Data Offset : 2048 sectors > > > Super Offset : 8 sectors > > > State : clean > > > Device UUID : fe922c1c:35319892:cc1e32e9:948d932c > > > > > > Update Time : Fri Jun 22 17:12:05 2012 > > > Checksum : 27a61d9a - correct > > > Events : 90893 > > > > > > Layout : near=2 > > > Chunk Size : 512K > > > > > > Device Role : Active device 0 > > > Array State : AAAAA ('A' == active, '.' == missing) > > > > Thanks. > > With this extra info - and the clearer perspective that morning > > provides > > - I see what is happening. > > > Ah, thank goodness for that. ^.^ > The patch worked fine: --- [ 105.872117] md: recovery of RAID array md3 [28981.157157] md: md3: recovery done. --- Thanks a bunch and I'd suggest to include this patch in any and all feasible backports and future kernels of course. Regards, Christian > > The following kernel patch should make it work for you. It was made > > and tested against 3.4. but should apply to your 3.2 kernel. > > > > The problem only occurs when recovering the last device in certain > > RAID10 arrays. If you had > 2 copies (e.g. --layout=n3) it could be > > more than just the last device. > > > > RAID10 with an odd number of devices (5 in this case) lays out chunks > > like this: > > > > A A B B C > > C D D E E > > F F G G H > > H I I J J > > > > If you have an even number of stripes, everything is happy. > > If you have an odd number of stripes - as is the case with your problem > > array > > - then the last stripe might look like: > > > > F F G G H > > > > The 'H' chunk only exists once. There is no mirror for it. > > md does not store any data in this chunk - the size of the array is > > calculated to finish after 'G'. > > However the recovery code isn't quite so careful. It tries to recover > > this chunk and loads it from beyond the end of the first device - which > > is where it would be if the devices were all a bit bigger. > > > That makes perfect sense, I'm just amazed to be the first one to > encounter this. Granted, most people will have even numbered stripes > based on typical controller and server backplanes (1U -> 4x 3.5 drives), > but the ability to use odd numbers (and gain the additional speed > another spindle adds) was always one of the nice points of the MD Raid10 > implementation. > > > So there is no risk of data corruption here - just that md tries to > > recover a block that isn't in the array, fails, and aborts the > > recovery. > > > That's a relief! > > > This patch gets it to complete the recovery earlier so that it doesn't > > try (and fail) to do the impossible. > > > > If you could test and confirm, I'd appreciate it. > > > I've build a new kernel-package (taking the opportunity to go to 3.2.20) > and the assorted drbd module and scheduled downtime for tomorrow. > > Should know if this fixes it by Wednesday. > > Many thanks, > > Christian > > > Thanks, > > NeilBrown > > > > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > > index 99ae606..bcf6ea8 100644 > > --- a/drivers/md/raid10.c > > +++ b/drivers/md/raid10.c > > @@ -2890,6 +2890,12 @@ static sector_t sync_request(struct mddev > > *mddev, sector_t sector_nr, /* want to reconstruct this device */ > > rb2 = r10_bio; > > sect = raid10_find_virt(conf, sector_nr, i); > > + if (sect >= mddev->resync_max_sectors) { > > + /* last stripe is not complete - don't > > + * try to recover this sector. > > + */ > > + continue; > > + } > > /* Unless we are doing a full sync, or a > > replacement > > * we only need to recover the block if it is > > set in > > * the bitmap > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html