On Wed, May 09, 2012 at 02:17:33PM +0200, piergiorgio.sartor@xxxxxxxx wrote: > Hi Neil, > > thanks a lot for the quick answer, please see the > text embedded below for further details. > > ----- Original Nachricht ---- > Von: NeilBrown <neilb@xxxxxxx> > An: piergiorgio.sartor@xxxxxxxx > Datum: 09.05.2012 13:03 > Betreff: Re: Another RAID-5 problem > > > On Wed, 9 May 2012 11:10:58 +0200 (CEST) piergiorgio.sartor@xxxxxxxx wrote: > > > > > Hi all, > > > > > > we're hit by a RAID-5 issue, it seems Ubuntu 12.04 is shipping > > > some bugged kernel/mdadm combination. > > > > Buggy kernel. My fault. I think they know and an update should follow. > > > > However I suspect that Ubuntu must be doing something else to cause the > > problem to trigger so often. The circumstance that makes it happen should > > be > > extremely rare. It is as though the md array is half-stopped just before > > shutdown. If it were completely stopped or not stopped at all, this > > wouldn't > > happen. > > > > > > > > Following the other thread about a similar issue, I understood > > > it is possible to fix the array without losing data. > > > > Correct. > > > > > > > > Problems are: > > > > > > 1) We do not know the HDD order and it is a 5 disks RAID-5 > > > > If you have kernel logs from the last successful boot they would contain > > a "RAID conf printout" which would give you the order, but maybe that it on > > the RAID-5 array? > > Unfortunately, the kernel logs are on the PC itself, so > we cannot get them. > > > If it is you will have to try different permutations until you find one > > that > > works. > > I've some questions about this topic. > > We have other, identical, PCs, which were built more or less > same time as this one. > One of this have a similar history, this means 4 drives RAID-5, > later extended to 5 (BTW, Ubuntu 10.10 delivered mdadm 2.6.7.1, > we extended the array later, with some 3.1 or 3.2, that can explain > the data offset difference). > > This identical PC shows the following (mdadm -D /dev/md1): > > ... > Number Major Minor RaidDevice State > 0 8 34 0 active sync /dev/sdc2 > 1 8 18 1 active sync /dev/sdb2 > 2 8 2 2 active sync /dev/sda2 > 5 8 50 3 active sync /dev/sdd2 > 4 8 66 4 active sync /dev/sde2 > > In this case I assume the "RaidDevice" indicates the order. > Is this correct? We could try with this one, at first. > What about "Number"? Why 3 is missing? > BTW, the broken RAID has /dev/sdd2 still valid, and "mdadm -E" > shows: > > ... > Device Role : Active device 3 > ... > > Which seem consistent with the working one. > > Nevertheless, there is something fishy. > If I try the "dd" command, you suggested below, the drive > which seems to show some consistent LVM data is /dev/sde2, > not /dev/sdc2. > > Specifically (dd with proper skip, i.e. 1048 for /dev/sde2): > > VolGroup { > id = "eK5Sde-ENzo-0iBO-dJIB-buBt-BnoX-NEmZ1v" > seqno = 1759 > status = ["RESIZEABLE", "READ", "WRITE"] > ... > > The others (with skip 264) either have zeros or some > LVM text, but not something looking properly aligned. > > Question would be if the growth changed, somehow, the > order, in which case how will "Create" behave? Considering > that one drive will be missing. > > > > 2) 4 of 5 disks have a data offset of 264 sectors, while the > > > fourth one, added later, has 1048 sectors. > > > > Ouch. > > It would be easiest to just make a degraded array with the 4 devices with > > the > > same data offset, then add the 5th later. > > To get the correct data offset you could either use the same mdadm that > > the > > array was originally built with, or you could get the 'r10-reshape' > > branch from git://neil.brown.name/mdadm/ and build that. > > Then create the array with --data-offset=132K as well as all the other > > flags. > > However that hasn't been tested extensively so it would be best to test it > > elsewhere first. Check that it created the array with correct data-offset > > and correct size. > > > > > 3) There is a LVM setup on the array, not a plain filesystem. > > > > That does make it a little more complex but not much. > > You would need to activate the LVM, then "fsck -n" the filesystems to check > > if > > you have the devices in the right order. > > However this could help you identify the first device quickly. > > If you > > dd if=/dev/sdXX skip=264 count=1 > > then for the first device in the array it will show you the textual > > description of the LVM setup. For the other devices it will probably be > > binary or something unrelated. > > > > > > > > Any idea on how can we get the array back without losing any > > > data? > > > > Do you know what the chunk size was? Probably 64K if it was an old array. > > Maybe 512K though. > > Chunk size we know. As mentioned above, we have other PCs, > all the same, chunk is 512K. > Metadata is 1.1. > > Bitmap was activated, but this, I understand, is not problem. > Furthermore "mdadm -X" on each HDD shows 0 dirty bits, > which looks good to me. > > > I would: > > 1/ look at old logs if possible to find out the device order > > 2/ try to remember what the chunk size could be. If you have the exact > > used-device size (mdadm -E should give that) you can get an upper limit > > for the chunk size by finding the larger power-of-2 which divides it. > > 3/ Try to identify the first device by looking for LVM metadata. > > 4/ Make a list of the possible arrangements of devices and possible chunk > > sizes based on the info you collected. Actually, we solved this issue in a "creative" way. Looking at: https://raid.wiki.kernel.org/index.php/RAID_superblock_formats we identify the proper address and look-up way for the component order and, using "od -Ax -tx4 /dev/sdXi | less" we were able to understand the device order. For the fresh men, please note that address 0xA0 has the device number and this is a pointer added to 0x100, where the device role is stored. While at 0xA0 are stored 4 bytes int, at 0x100 are stored 2 bytes int (short int), so the "-tx4" of "od" swaps (due to CPU endianess) each pair of short int, so the order will be 1 0 3 2 5 4 ... If "-tx2" is used, than the 0x100 will be correct, but 0xA0 will have bytes swapped pair wise. Fortunately, the superblock seemed OK for the data. Other information, like raid level, was completely wiped out... The only itch left is the data offset. We plan to try to use mdadm 2.6.7.1 (which originally created the array) using Ubuntu 10.10 desktop (live). Still, I would like to know about backing up the first few MB of each component with "dd" and about switching to read only, in order to avoid damage by LVM/mount. Thanks again, bye, pg > > 5/ Check that you can create an array with a data-offset for 264 sectors > > using one of the approaches listed above. > > 6/ write a script which iterated though the possibilities and re-created > > the > > array then tries to turn on LVM and the fsck. Or maybe iterate by > > hand. > > The command to create an array would be something like > > mdadm -C /dev/md0 -l5 -n5 --assume-clean --chunk=64 \ > > --data-offset=132K /dev/sdX missing /dev/sdY /dev/sdZ /dev/sdW > > 7/ Find out which arrangement produces least fsck errors, and use that. > > I do have another question. > > How about starting the RAID in read-only mode? > This will avoid LVM or mount to write something, risking > damages to the different superblocks. > What would be the best way to do this? > After "Create", just "mdadm --read-only /dev/md1"? > > One more, how about dumping, with "dd", the firsts > few MB of each drive as backup? Make sense? > > Thanks again for the support, > > bye, > > pg > > > > > > > At the moment, it seems quite difficult to provide dump of > > > "mdadm -E" or similar, since the PC does not boot at all. > > > In any case, if necessary we could try to take a picture of > > > the screen and send it here or directly per email, if appropriate. > > > > You probably need to boot from a DVD-ROM or similar. > > Certainly feel free to post the data you collect and the conclusions you > > draw > > and even the script you write if you would like them reviewed and > > confirmed. > > > > NeilBrown > > > > > > > > -- > > piergiorgio > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- piergiorgio -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html