RE: fd partitions gone from 2 discs, md happy with it and reconstructs... bye bye datas

"Philippe PIOLAT" <piolat@xxxxxxxxxxxx> · Tue, 4 Jan 2011 12:35:47 +0100

Thanks a lot for your help. I think I sunk deeper meanwhile...

I could recover the /dev/sdg1 and /dev/sdh1 entries using partprobe.

I then zeroed the superblocks from all disks, recreated the array with
--assume-clean and started the resync.
Then I received your answer and as you advised searched from older logs and
discovered... that I made the mistake to add sdg and sdh to the array at
last upgrade, and not sdg1 and sdh1 as I believed I did!...
So I quicly stopped the sync (was just started a few minutes ago), killed
the sdg1 and sdh1 partitions, zeroed the superblocks and assemblied again
with --assume-clean. As of now, it's syncing again...

Its probably hopeless now isn't it?.... :-(

Phil.

-----Message d'origine-----
De : linux-raid-owner@xxxxxxxxxxxxxxx
[mailto:linux-raid-owner@xxxxxxxxxxxxxxx] De la part de NeilBrown
Envoyé : mardi 4 janvier 2011 12:04
À : Philippe PIOLAT
Cc : linux-raid@xxxxxxxxxxxxxxx
Objet : Re: fd partitions gone from 2 discs, md happy with it and
reconstructs... bye bye datas

On Tue, 4 Jan 2011 10:11:10 +0100 "Philippe PIOLAT" <piolat@xxxxxxxxxxxx>
wrote:

> Hey gurus, need some help badly with this one.
> I run a server with a 6Tb md raid5 volume built over 7*1Tb disks.
> I've had to shut down the server lately and when it went back up, 2 
> out of the 7 disks used for the raid volume had lost its conf :

I should say up front that I suspect you have lost your data.  However there
is enough here that doesn't make sense that I cannot be certain of anything.

> 
> dmesg :
> [   10.184167]  sda: sda1 sda2 sda3 // System disk
> [   10.202072]  sdb: sdb1
> [   10.210073]  sdc: sdc1
> [   10.222073]  sdd: sdd1
> [   10.229330]  sde: sde1
> [   10.239449]  sdf: sdf1
> [   11.099896]  sdg: unknown partition table
> [   11.255641]  sdh: unknown partition table

If sdg and sdh had a partition table before, but don't now, then at least
the first block of each of those devices has been corrupted.  In that case
we must assume that an unknown number of blocks at the start of those drives
has been corrupted.  In that case you could have already lost critical data
and this point and nothing you could have done would have helped.

> 
> All 7 disks have same geometry and were configured alike :
> 
> dmesg :
> Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 
> 16065 * 512 = 8225280 bytes Disk identifier: 0x1e7481a5
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdb1               1      121601   976760001   fd  Linux raid
> autodetect

So the partition started 16065 sectors from the start of the device.
This is not a multiple of 64K, which is good.
If a partition starts at a multiple of 64K from the start of the device and
extends to the end of the device, then the md metadata on the partition
could look like it was on the disk as well.
When mdadm sees a situation like this it will complain, but that cannot have
been happening to you.
So when the partition table was destroy, mdadm should not have been able to
see the metadata that belonged to the partition.

> 
> All 7 disks (sdb1, sdc1, sdd1, sde1, sdf1, sdg1, sdh1) were used in a 
> md
> raid5 xfs volume.
> When booting, md, which was (obviously) out of sync kicked in and 
> automatically started rebuilding over the 7 disks, including the two 
> "faulty" ones; xfs tried to do some shenanigans as well:
> 
> dmesg :
>  [   19.566941] md: md0 stopped.
> [   19.817038] md: bind<sdc1>
> [   19.817339] md: bind<sdd1>
> [   19.817465] md: bind<sde1>
> [   19.817739] md: bind<sdf1>
> [   19.817917] md: bind<sdh>
> [   19.818079] md: bind<sdg>
> [   19.818198] md: bind<sdb1>
> [   19.818248] md: md0: raid array is not clean -- starting background
> reconstruction
> [   19.825259] raid5: device sdb1 operational as raid disk 0
> [   19.825261] raid5: device sdg operational as raid disk 6
> [   19.825262] raid5: device sdh operational as raid disk 5
> [   19.825264] raid5: device sdf1 operational as raid disk 4
> [   19.825265] raid5: device sde1 operational as raid disk 3
> [   19.825267] raid5: device sdd1 operational as raid disk 2
> [   19.825268] raid5: device sdc1 operational as raid disk 1
> [   19.825665] raid5: allocated 7334kB for md0
> [   19.825667] raid5: raid level 5 set md0 active with 7 out of 7 devices,
> algorithm 2

... however it is clear that mdadm (and md) saw metadata at the end of the
device which exactly matched the metadata on the other devices in the array.

This is very hard to explain.  I can only think of there explanations, none
of which seem particularly likely

1/ The partition table on sdg and sdh actually placed the first partition at
    a multiple of 64K unlike all the other devices in the array.
2/ someone copied the superblock from the end of sdg1 to the end of sdg, and
   also for sdh1 to sdh.
   Given that the first block of both devices was changed too, a command
like:
     dd if=/dev/sdg1 of=/dev/sdg
   would have done it.   But that seems extremely unlikely.
3/ The array previously consisted of 5 partitions and 2 whole devices.
   I have certainly seen this happen before, usually by accident.
   But if this were the case, your data should all be intact.  Yet it isn't.

> [   19.825669] RAID5 conf printout:
> [   19.825670]  --- rd:7 wd:7
> [   19.825671]  disk 0, o:1, dev:sdb1
> [   19.825672]  disk 1, o:1, dev:sdc1
> [   19.825673]  disk 2, o:1, dev:sdd1
> [   19.825675]  disk 3, o:1, dev:sde1
> [   19.825676]  disk 4, o:1, dev:sdf1
> [   19.825677]  disk 5, o:1, dev:sdh
> [   19.825679]  disk 6, o:1, dev:sdg
> [   19.899787] PM: Starting manual resume from disk
> [   28.663228] Filesystem "md0": Disabling barriers, not supported by the
> underlying device
> [   28.663228] XFS mounting filesystem md0
> [   28.884433] md: resync of RAID array md0
> [   28.884433] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [   28.884433] md: using maximum available idle IO bandwidth (but not more
> than 200000 KB/sec) for resync.
> [   28.884433] md: using 128k window, over a total of 976759936 blocks.

This resync is why I think your data could well be lost.
If the metadata did somehow get relocated, but the data didn't, then this
will have updated all of the blocks that were thought to be parity blocks.
All of those on sdh and sdh would almost certainly have been data blocks,
and that data would now be gone.
But there are still some big 'if's in there.

> [   29.025980] Starting XFS recovery on filesystem: md0 (logdev: internal)
> [   32.680486] XFS: xlog_recover_process_data: bad clientid
> [   32.680495] XFS: log mount/recovery failed: error 5
> [   32.682773] XFS: log mount failed
> 
> I ran fdisk and flagged sdg1 and sdh1 as fd.

If, however, the md metadata had not been moved, and the array was
previously made of 5 partitions and two devices, then this action would have
corrupted some data early in the array possible making it impossible to
recover the xfs filesystem (not that it looked like it was particularly
recoverable anyway).

> I tried to reassemble the array but it didnt work: no matter what was 
> in mdadm.conf, it still uses sdg and sdh instead of sdg1 and sdh1.

This seems to confirm that the metadata that we thought was on sdg1 and sdh1
wasn't.  Using "mdadm --examine /dev/sdg1" for example would confirm.

> I checked in /dev and I see no sdg1 and and sdh1, shich explains why 
> it wont use it.

mdadm -S /dev/md0
block --rereadpt /dev/sdg /dev/sdh

should fix that.

> I just don't know why those partitions are gone from /dev and how to 
> readd those...
> 
> blkid :
> /dev/sda1: LABEL="boot" UUID="519790ae-32fe-4c15-a7f6-f1bea8139409"
> TYPE="ext2" 
> /dev/sda2: TYPE="swap" 
> /dev/sda3: LABEL="root" UUID="91390d23-ed31-4af0-917e-e599457f6155"
> TYPE="ext3" 
> /dev/sdb1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" 
> /dev/sdc1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" 
> /dev/sdd1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" 
> /dev/sde1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" 
> /dev/sdf1: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" 
> /dev/sdg: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" 
> /dev/sdh: UUID="2802e68a-dd11-c519-e8af-0d8f4ed72889" TYPE="mdraid" 
> 
> fdisk -l :
> Disk /dev/sda: 40.0 GB, 40020664320 bytes
> 255 heads, 63 sectors/track, 4865 cylinders Units = cylinders of 16065 
> * 512 = 8225280 bytes Disk identifier: 0x8c878c87
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sda1   *           1          12       96358+  83  Linux
> /dev/sda2              13         134      979965   82  Linux swap /
Solaris
> /dev/sda3             135        4865    38001757+  83  Linux
> 
> Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 
> 16065 * 512 = 8225280 bytes Disk identifier: 0x1e7481a5
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdb1               1      121601   976760001   fd  Linux raid
> autodetect
> 
> Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 
> 16065 * 512 = 8225280 bytes Disk identifier: 0xc9bdc1e9
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdc1               1      121601   976760001   fd  Linux raid
> autodetect
> 
> Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 
> 16065 * 512 = 8225280 bytes Disk identifier: 0xcc356c30
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdd1               1      121601   976760001   fd  Linux raid
> autodetect
> 
> Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 
> 16065 * 512 = 8225280 bytes Disk identifier: 0xe87f7a3d
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sde1               1      121601   976760001   fd  Linux raid
> autodetect
> 
> Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 
> 16065 * 512 = 8225280 bytes Disk identifier: 0xb17a2d22
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdf1               1      121601   976760001   fd  Linux raid
> autodetect
> 
> Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 
> 16065 * 512 = 8225280 bytes Disk identifier: 0x8f3bce61
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdg1               1      121601   976760001   fd  Linux raid
> autodetect
> 
> Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 
> 16065 * 512 = 8225280 bytes Disk identifier: 0xa98062ce
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdh1               1      121601   976760001   fd  Linux raid
> autodetect
> 
> I really dont know what happened nor how to recover from this mess.
> Needless to say the 5TB or so worth of data sitting on those disks are 
> very valuable to me...
> 
> Any idea any one?
> Did anybody ever experienced a similar situation or know how to 
> recover from it ?
> 
> Can someone help me? I'm really desperate... :x

I would see if your /var/logs file go back to the last reboot of this system
and see if they show how the array was assembled then.  If they do, then
collect any message about md or raid from that time until now.

That might give some hints as to what happened, but I don't hold a lot of
hope that it will allow your data to be recovered.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in the
body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html