Re: fd partitions gone from 2 discs, md happy with it and reconstructs... bye bye datas

NeilBrown <neilb@xxxxxxx> · Tue, 4 Jan 2011 22:03:45 +1100

On Tue, 4 Jan 2011 10:11:10 +0100 "Philippe PIOLAT" <piolat@xxxxxxxxxxxx>
wrote:

> Hey gurus, need some help badly with this one.
> I run a server with a 6Tb md raid5 volume built over 7*1Tb disks.
> I've had to shut down the server lately and when it went back up, 2 out of
> the 7 disks used for the raid volume had lost its conf :

I should say up front that I suspect you have lost your data.  However there
is enough here that doesn't make sense that I cannot be certain of anything.

> 
> dmesg :
> [   10.184167]  sda: sda1 sda2 sda3 // System disk
> [   10.202072]  sdb: sdb1
> [   10.210073]  sdc: sdc1
> [   10.222073]  sdd: sdd1
> [   10.229330]  sde: sde1
> [   10.239449]  sdf: sdf1
> [   11.099896]  sdg: unknown partition table
> [   11.255641]  sdh: unknown partition table

If sdg and sdh had a partition table before, but don't now, then at least the
first block of each of those devices has been corrupted.  In that case we
must assume that an unknown number of blocks at the start of those drives has
been corrupted.  In that case you could have already lost critical data and
this point and nothing you could have done would have helped.

> 
> All 7 disks have same geometry and were configured alike :
> 
> dmesg :
> Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0x1e7481a5
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdb1               1      121601   976760001   fd  Linux raid
> autodetect

So the partition started 16065 sectors from the start of the device.
This is not a multiple of 64K, which is good.
If a partition starts at a multiple of 64K from the start of the device and
extends to the end of the device, then the md metadata on the partition could
look like it was on the disk as well.
When mdadm sees a situation like this it will complain, but that cannot have
been happening to you.
So when the partition table was destroy, mdadm should not have been able to
see the metadata that belonged to the partition.

> 
> All 7 disks (sdb1, sdc1, sdd1, sde1, sdf1, sdg1, sdh1) were used in a md
> raid5 xfs volume.
> When booting, md, which was (obviously) out of sync kicked in and
> automatically started rebuilding over the 7 disks, including the two
> "faulty" ones; xfs tried to do some shenanigans as well:
> 
> dmesg :
>  [   19.566941] md: md0 stopped.
> [   19.817038] md: bind<sdc1>
> [   19.817339] md: bind<sdd1>
> [   19.817465] md: bind<sde1>
> [   19.817739] md: bind<sdf1>
> [   19.817917] md: bind<sdh>
> [   19.818079] md: bind<sdg>
> [   19.818198] md: bind<sdb1>
> [   19.818248] md: md0: raid array is not clean -- starting background
> reconstruction
> [   19.825259] raid5: device sdb1 operational as raid disk 0
> [   19.825261] raid5: device sdg operational as raid disk 6
> [   19.825262] raid5: device sdh operational as raid disk 5
> [   19.825264] raid5: device sdf1 operational as raid disk 4
> [   19.825265] raid5: device sde1 operational as raid disk 3
> [   19.825267] raid5: device sdd1 operational as raid disk 2
> [   19.825268] raid5: device sdc1 operational as raid disk 1
> [   19.825665] raid5: allocated 7334kB for md0
> [   19.825667] raid5: raid level 5 set md0 active with 7 out of 7 devices,
> algorithm 2

... however it is clear that mdadm (and md) saw metadata at the end of the
device which exactly matched the metadata on the other devices in the array.

This is very hard to explain.  I can only think of there explanations, none
of which seem particularly likely

1/ The partition table on sdg and sdh actually placed the first partition at
    a multiple of 64K unlike all the other devices in the array.
2/ someone copied the superblock from the end of sdg1 to the end of sdg, and
   also for sdh1 to sdh.
   Given that the first block of both devices was changed too, a command like:
     dd if=/dev/sdg1 of=/dev/sdg
   would have done it.   But that seems extremely unlikely.
3/ The array previously consisted of 5 partitions and 2 whole devices.
   I have certainly seen this happen before, usually by accident.
   But if this were the case, your data should all be intact.  Yet it isn't.

> [   19.825669] RAID5 conf printout:
> [   19.825670]  --- rd:7 wd:7
> [   19.825671]  disk 0, o:1, dev:sdb1
> [   19.825672]  disk 1, o:1, dev:sdc1
> [   19.825673]  disk 2, o:1, dev:sdd1
> [   19.825675]  disk 3, o:1, dev:sde1
> [   19.825676]  disk 4, o:1, dev:sdf1
> [   19.825677]  disk 5, o:1, dev:sdh
> [   19.825679]  disk 6, o:1, dev:sdg
> [   19.899787] PM: Starting manual resume from disk
> [   28.663228] Filesystem "md0": Disabling barriers, not supported by the
> underlying device
> [   28.663228] XFS mounting filesystem md0
> [   28.884433] md: resync of RAID array md0
> [   28.884433] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [   28.884433] md: using maximum available idle IO bandwidth (but not more
> than 200000 KB/sec) for resync.
> [   28.884433] md: using 128k window, over a total of 976759936 blocks.

This resync is why I think your data could well be lost.
If the metadata did somehow get relocated, but the data didn't, then this
will have updated all of the blocks that were thought to be parity blocks.
All of those on sdh and sdh would almost certainly have been data blocks, and
that data would now be gone.
But there are still some big 'if's in there.

> [   29.025980] Starting XFS recovery on filesystem: md0 (logdev: internal)
> [   32.680486] XFS: xlog_recover_process_data: bad clientid
> [   32.680495] XFS: log mount/recovery failed: error 5
> [   32.682773] XFS: log mount failed
> 
> I ran fdisk and flagged sdg1 and sdh1 as fd.

If, however, the md metadata had not been moved, and the array was previously
made of 5 partitions and two devices, then this action would have corrupted
some data early in the array possible making it impossible to recover the xfs
filesystem (not that it looked like it was particularly recoverable anyway).

> I tried to reassemble the array but it didnt work: no matter what was in
> mdadm.conf, it still uses sdg and sdh instead of sdg1 and sdh1.

This seems to confirm that the metadata that we thought was on sdg1 and sdh1
wasn't.  Using "mdadm --examine /dev/sdg1" for example would confirm.

> I checked in /dev and I see no sdg1 and and sdh1, shich explains why it wont
> use it.

mdadm -S /dev/md0
block --rereadpt /dev/sdg /dev/sdh

should fix that.

> I just don't know why those partitions are gone from /dev and how to readd
> those...
> 
> blkid :
> /dev/sda1: LABEL="boot" UUID="519790ae-32fe-4c15-a7f6-f1bea8139409"
> TYPE="ext2" 
> /dev/sda2: TYPE="swap" 
> /dev/sda3: LABEL="root" UUID="91390d23-ed31-4af0-917e-e599457f6155"
> TYPE="ext3" 
> /dev/sdb1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" 
> /dev/sdc1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" 
> /dev/sdd1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" 
> /dev/sde1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" 
> /dev/sdf1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" 
> /dev/sdg: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" 
> /dev/sdh: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" 
> 
> fdisk -l :
> Disk /dev/sda: 40.0 GB, 40020664320 bytes
> 255 heads, 63 sectors/track, 4865 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0x8c878c87
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sda1   *           1          12       96358+  83  Linux
> /dev/sda2              13         134      979965   82  Linux swap / Solaris
> /dev/sda3             135        4865    38001757+  83  Linux
> 
> Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0x1e7481a5
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdb1               1      121601   976760001   fd  Linux raid
> autodetect
> 
> Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0xc9bdc1e9
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdc1               1      121601   976760001   fd  Linux raid
> autodetect
> 
> Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0xcc356c30
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdd1               1      121601   976760001   fd  Linux raid
> autodetect
> 
> Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0xe87f7a3d
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sde1               1      121601   976760001   fd  Linux raid
> autodetect
> 
> Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0xb17a2d22
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdf1               1      121601   976760001   fd  Linux raid
> autodetect
> 
> Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0x8f3bce61
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdg1               1      121601   976760001   fd  Linux raid
> autodetect
> 
> Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0xa98062ce
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdh1               1      121601   976760001   fd  Linux raid
> autodetect
> 
> I really dont know what happened nor how to recover from this mess.
> Needless to say the 5TB or so worth of data sitting on those disks are very
> valuable to me...
> 
> Any idea any one?
> Did anybody ever experienced a similar situation or know how to recover from
> it ?
> 
> Can someone help me? I'm really desperate... :x

I would see if your /var/logs file go back to the last reboot of this system
and see if they show how the array was assembled then.  If they do, then
collect any message about md or raid from that time until now.

That might give some hints as to what happened, but I don't hold a lot of
hope that it will allow your data to be recovered.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html