Hi Neil, Comments interspersed.. --- On Tue, 15/2/11, NeilBrown <neilb@xxxxxxx> wrote: > From: NeilBrown <neilb@xxxxxxx> > Subject: Re: mdadm: recovering from an aborted reshape op - boot messages > To: "Gavin Flower" <gavinflower@xxxxxxxxx> > Cc: linux-raid@xxxxxxxxxxxxxxx > Date: Tuesday, 15 February, 2011, 12:55 > On Mon, 14 Feb 2011 14:47:48 -0800 > (PST) Gavin Flower <gavinflower@xxxxxxxxx> > wrote: > > > Hi Neil, > > > > I did not notice this before (note: I have poor > eyesight, so unless I explicitly look, I may not notice > things!). but just before Fedora drops to the shell on a > reboot I saw these messages (hand transcribed, so might have > the odd transcription error): > > > > /dev/md1: The filing system size (according to the > superblock) is 76799952 blocks > > The physical size of the device is 76799616 > > Either the superblock or the partition table is likely > to be corrupt! > > > > /dev/md1: UNEXPECTED INCONSISTENCY: RUN fsck manually > > (i.e. without -a or -p options) > > > > Note that original size according mdadm was not a > multiple of 512KB, so I reshaped it to be the largest > multiple or 512KB less than the original size. So my > second attempt to reshape, using the 512 chunk size, started > okay. > > > > Advice appreciated. > > Hmmm.... > > Firstly, the -A and -E output you sent are inconsistent. I can not explain the inconsistency. However, they were both done on the same machine ('saturn'). No software updates were done on 'saturn' since before the reshaping. The -A output was the process that took over an hour. > The "-A" output reports: > > mdadm:/dev/md1 has an active reshape - checking if critical > section needs to be restored > > For 0.90 metadata (which you are using), that can only be > reported if the > minor number is at least 91. i.e. it has been > temporarily set to 0.91. > > However the "-E" output show that all devices are > "0.90.00", not 0.91. I grepped strings /sbin/mdadm for '.9', and found both '0.90' and '0.91' - for what it is worth. ls on /sbin/mdadm gives the size of 362296 bytes and the date 5 Aug 2010. version is v3.1.2 - 10th March 2010 > > So those devices cannot possibly produce that -A output. The output was sent directly to the USB stick, so there are no transcription errors. So as far as I can tell, these devices did produce the output. They are the only devices I have accessed using RAID many months. There are only the 5 hard disks on 'saturn'. Is there anything I can do to track down this anomaly? > > The devices appear to have all completely transitioned to > 512K chunksize.... > > And the -D output seems to show that the array is fine and > working properly. > > Secondly, as you say you reshaped the array to make it > slightly smaller so it > would be a multiple of 512K. This is obviously needed > to change the chunk > size. I used the âsize= option of mdadm > > But before you did that - did you resize the filesystem to > be only that big? No, and there is no mention in man mdadm to do so, that I could see. > I suspect not. So the filesystem thinks that it is > bigger than the device. > I don't know how best to fix that. I would have thought mdadm would have done that as part of the process â as surely the size of the filesystem could not be reduced in advance of the reshaping. Perhaps, I have overlooked the obvious? > > You could try running 'resize2fs" now (was it ext3? I don't > remember). Or > maybe an 'fsck -f' might fix it. > > It might be safest to ask on ext3-users@xxxxxxxxxxx > Report that you shrunk > your array before shrinking the filesystem and ask what the > best remedial > strategy is. > > NeilBrown > > I will look into your other suggestions about recovery. If there is anything further I can do, to provide useful diagnostics, please let me now. Thanks, Gavin -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html