Re: Panicked and deleted superblock

NeilBrown <neilb@xxxxxxxx> · Fri, 04 Nov 2016 15:34:24 +1100

On Mon, Oct 31 2016, Peter Hoffmann wrote:

> My problem is the result of working late and not informing myself
> previously, I'm fully aware that I should have had a backup, be less
> spontaneous and more cautious.
>
> The initial situation is a RAID-5 array with three disks. I assume it to
> look follows:
>
> | Disk 1   | Disk 2   | Disk 3   |
> |----------|----------|----------|
> |    out   | Block 2  | P(1,2)   |
> |    of    | P(3,4)   | Block 4  |	degenerated but working
> |   sync   | Block 5  | Block 6  |

The default RAID5 layout (there a 4 to choose from) is
#define ALGORITHM_LEFT_SYMMETRIC	2 /* Rotating Parity N with Data Continuation */

The first data block on a stripe is always located after the parity
block.
So if data is D0 D1 D2 D3.... then

   D0   D1   P01
   D3   P23  D2
   P45  D4   D5

>
>
> Then I started the re-sync:
>
> | Disk 1   | Disk 2   | Disk 3   |
> |----------|----------|----------|
> | Block 1  | Block 2  | P(1,2)   |
> | Block 3  | P(3,4)   | Block 4  |   	already synced
> | P(5,6)   | Block 5  | Block 6  |
>                . . .
> |    out   | Block b  | P(a,b)   |
> |    of    | P(c,d)   | Block d  |	not yet synced
> |   sync   | Block e  | Block f  |
>
> But I didn't wait for it to finish as I actually wanted to add a fourth
> disk and so started a grow process. But I just changed the size of the
> array, I didn't actually add the fourth disk (don't ask why I cannot
> recall it). I assume that both processes - re-sync  and grow - raced
> through the array and did their job.

So you ran
  mdadm --grow /dev/md0 --raid-disks 4 --force

???
You would need --force or mdadm would refuse to do such a silly thing.

Also, the kernel would refuse to let a reshape start while a resync was
on-going, so the reshape attempt should have been rejected anyway.

>
> | Disk 1   | Disk 2   | Disk 3   |
> |----------|----------|----------|
> | Block 1  | Block 2  | Block 3  |
> | Block 4  | Block 5  | P(4,5,6) |	with four disks but degenerated
> | Block 7  | P(7,8,9) | Block 8  |
>                . . .
> | Block a  | Block b  | P(a,b)   |
> | Block c  | P(c,d)   | Block d  |	not yet grown but synced
> | P(e,f)   | Block e  | Block f  |
>                . . .
> |    out   | Block V  | P(U,V)   |
> |    of    | P(W,X)   | Block X  |		not yet synced
> |   sync   | Block Y  | Block Z  |
>
> And after running for a while - my NAS is very slow (partly because all
> disks are LUKS'd), mdstat showed around 1GiB of Data processed - we had
> a blackout. Water dropped in a distribution socket and *poff*. After a
> reboot I wanted to resemble everything, didn't know what I was doing so
> the RAID superblock is now lost and I failed to reassemble (this is the
> part I really can't recall, I panicked). I never wrote anything to the
> actual array so I assume, better hope that no actual data is lost.

So you deliberately erased the RAID superblock?  Presumably not.
Maybe you ran "mdadm --create ...." to try to create a new array?  That
would do it.

If the reshape hadn't actually started, then you have some chance of
recovering your data.  If it had, then recovery is virtually impossible
because you don't know how far it got.

>
> I have a plan but wanted to check with you before doing anything stupid
> again.
> My idea is to look for that magic number of the ext4-fs to find the
> beginning of Block 1 on Disk 1, then I would copy an reasonable amount
> of data and try to figure out how big Block 1 and hence chunk-size is -
> perhaps fsck.ext4 can help do that? After that I copy another reasonable
> amount of data from Disks 1-3 to figure out the border between the grown
> Stripes and the synced Stripes. And from there on I'd have my data in a
> defined state from which I can save the whole file system.
> One thing I'm wondering is if I got the layout right. And the other
> might be rather a case for the ext4-mailing list but I'd ask it anyway:
> how can I figure where the file system starts to be corrupted?

You might be able to make something like this work .. if reshape hadn't
started.  But if you can live without recovering the data, then that is
probably the more cost effective option.

NeilBrown
Attachment:
signature.asc

Description: PGP signature