Re: Panicked and deleted superblock

Peter Hoffmann <Hoffmann.P@xxxxxxx> · Sun, 30 Oct 2016 21:45:27 +0100



On Sun, Oct 30, 2016 at 08:43:00PM +100 Andreas Klauer wrote:
> On Sun, Oct 30, 2016 at 07:23:00PM +0100, Peter Hoffmann wrote:
>> I assume that both processes - re-sync  and grow - raced
>> through the array and did their job.
> 
> Oi oi oi, it's still one process per raid, no races. Isn't it? 
> I'm not a kernel developer so I don't really *know* what happens 
> in this case, but in my imagination it should go something like 
> - disk that is not fully synced, treat as unsynced/degraded, repopulate.
> 
> Either that or it's actually smart enough to remember it synced up to X 
> and just does the right thing(tm), whatever that is. But that sounds 
> like having to write out a lot of special cases instead of handling 
> the degraded case you must be able to cope with anyhow.
> 
> You have to re-write it and recalculate all parity anyway since the grow 
> changes everything.
> 
> As long as it didn't consider your half a disk to be fully synced, 
> your data should be completely fine. The only question is - where. ;)
All right, but even if the re-sync stopped as I started growing, like
you wrote, there shouldn't anything be lost as growing consumes more
than it writes, stripe wise speaking

| (D1) |  D2  |  D3  |      |  D1  |  D2  |  D3  | (D4) |
|------|------|------|      |------|------|------|------|
| (B1) |  B2  | P1,2 | ->   |  B1  |  B2  |  B3  |(P123)|
| (B4) | P3,4 |  B3  |      |  B5  |  B6  | P456 | (B4) |
| (P)  |  B5  |  B6  |      |  ?   | [B5] | [B6] |      |
Where () shows non existent but should-be-there blocks and
      [] existing but shouldn't be there blocks

>> And after running for a while - my NAS is very slow (partly because all
>> disks are LUKS'd), mdstat showed around 1GiB of Data processed - we had
>> a blackout.
> 
> Stop trying to scare me! I'm not scared. 
> You you you and your spine-chilling halloween horror story.
> 
> Slow because of LUKS? You don't have LUKS below your RAID layer, right?
> Right? (Right?)
Ehm, ehm, may I call my lawer ;-) Yes,

/dev/sda2 --luks--> /dev/mapper/HDD_0 \
/dev/sdb2 --luks--> /dev/mapper/HDD_1 --raid--> /dev/md127 -ext4-> /raid
/dev/sdc2 --luks--> /dev/mapper/HDD_2 /

>> the RAID superblock is now lost
> 
> Other people have proved Murphy's Law before, you know, 
> why bother doing it again?
> 
>> My idea is to look for that magic number of the ext4-fs to find the
>> beginning of Block 1 on Disk 1, then I would copy an reasonable amount
>> of data and try to figure out how big Block 1 and hence chunk-size is -
>> perhaps fsck.ext4 can help do that?
> 
> Determining the data offset, that's fine, only one thing to consider.
> Growing RAIDs changes that very offset you're looking for, so.
> Even if you find it, it's still wrong.
> 
>> One thing I'm wondering is if I got the layout right. And the other
>> might be rather a case for the ext4-mailing list but I'd ask it anyway:
>> how can I figure where the file system starts to be corrupted?
> 
> Let's not care about your filesystem for now. Also forget fsck.
> 
> It's dangerous to go alone. Take this.
> 
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
> 
> Create two overlays. Two. Okay?
> 
> Overlay #1: You create your not-fully-grown 4 disk raid.
> 
> You have to figure out the disk order, raid level, metadata version,
* disk order seems pretty obvious to me _UU and later UUU_
* raid level is 5
* 1) data offset on the grown system seems to be 100000
     (at least I find ext4s magic signature at 100000+400+38)
  2) no idea where it might be for the unsynced version
* chunk size I have no idea, I might have adjusted it from the default
value for better alignment with the file system
* layout should be the default left-symetric
  (all diagrams in the original mail are wrong as data blocks in a
stripe start after parity block not with first disk)
* anything else?


> data offset, chunk size, layout, and some things I don't remember. 
> If you got it right there should be a filesystem on the raid device. 
> Or a LUKS header. Or something that makes any sense whatsoever at 
> least for however far the reshape actually progressed.
> 
> Overlay #2: You create your not-fully-synced 3 disk raid.
>             Leaving the not-fully-synced disk as missing.
> 
> Basically this is the same thing as #1, except the data offset 
> might be different, there's obviously no 4th disk, and one of 
> the other three missing.
> 
> There probably WON'T be a filesystem on this one because it's 
> already grown over. So the beginning of this device is garbage, 
> it only starts making sense after the area that wasn't reshaped.
So I'm looking for a sequence of bytes that is duplicated on both
overlays. This way I find the border between both parts.

> If it was unencrypted... oh well. It wasn't. Was it?
> Now you've done it, I'm confused.
> 
> Then you find the point where data overlaps and create a linear mapping. 
> It overlaps because 4 disk more space than 3 so 1GB on 4 won't overwrite 
> 1GB on 3 so there is an overlapping zone.
> 
> And you're done. At least in terms of having access to the whole thing.
> 
> Easy peasy.
> 
> Regards
> Andreas Klauer
Thank you, that overlay file system is the way to go

> PS: Do you _really_ not have anything left. Logfiles? Anything?
>     Maybe you asked anything about your raid anywhere before 
>     and posted examine along with it, tucked away in some 
>     linux forum or chat you might have perused...
> 
>     Please check. Your story is really interesting but nothing 
>     beats hard facts such as actual output of your crap.
I'd be happy to have any such things but I never had any trouble before
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html