On Sun, Oct 30, 2016 at 07:23:00PM +0100, Peter Hoffmann wrote: > I assume that both processes - re-sync and grow - raced > through the array and did their job. Oi oi oi, it's still one process per raid, no races. Isn't it? I'm not a kernel developer so I don't really *know* what happens in this case, but in my imagination it should go something like - disk that is not fully synced, treat as unsynced/degraded, repopulate. Either that or it's actually smart enough to remember it synced up to X and just does the right thing(tm), whatever that is. But that sounds like having to write out a lot of special cases instead of handling the degraded case you must be able to cope with anyhow. You have to re-write it and recalculate all parity anyway since the grow changes everything. As long as it didn't consider your half a disk to be fully synced, your data should be completely fine. The only question is - where. ;) > And after running for a while - my NAS is very slow (partly because all > disks are LUKS'd), mdstat showed around 1GiB of Data processed - we had > a blackout. Stop trying to scare me! I'm not scared. You you you and your spine-chilling halloween horror story. Slow because of LUKS? You don't have LUKS below your RAID layer, right? Right? (Right?) > the RAID superblock is now lost Other people have proved Murphy's Law before, you know, why bother doing it again? > My idea is to look for that magic number of the ext4-fs to find the > beginning of Block 1 on Disk 1, then I would copy an reasonable amount > of data and try to figure out how big Block 1 and hence chunk-size is - > perhaps fsck.ext4 can help do that? Determining the data offset, that's fine, only one thing to consider. Growing RAIDs changes that very offset you're looking for, so. Even if you find it, it's still wrong. > One thing I'm wondering is if I got the layout right. And the other > might be rather a case for the ext4-mailing list but I'd ask it anyway: > how can I figure where the file system starts to be corrupted? Let's not care about your filesystem for now. Also forget fsck. It's dangerous to go alone. Take this. https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file Create two overlays. Two. Okay? Overlay #1: You create your not-fully-grown 4 disk raid. You have to figure out the disk order, raid level, metadata version, data offset, chunk size, layout, and some things I don't remember. If you got it right there should be a filesystem on the raid device. Or a LUKS header. Or something that makes any sense whatsoever at least for however far the reshape actually progressed. Overlay #2: You create your not-fully-synced 3 disk raid. Leaving the not-fully-synced disk as missing. Basically this is the same thing as #1, except the data offset might be different, there's obviously no 4th disk, and one of the other three missing. There probably WON'T be a filesystem on this one because it's already grown over. So the beginning of this device is garbage, it only starts making sense after the area that wasn't reshaped. If it was unencrypted... oh well. It wasn't. Was it? Now you've done it, I'm confused. Then you find the point where data overlaps and create a linear mapping. It overlaps because 4 disk more space than 3 so 1GB on 4 won't overwrite 1GB on 3 so there is an overlapping zone. And you're done. At least in terms of having access to the whole thing. Easy peasy. Regards Andreas Klauer PS: Do you _really_ not have anything left. Logfiles? Anything? Maybe you asked anything about your raid anywhere before and posted examine along with it, tucked away in some linux forum or chat you might have perused... Please check. Your story is really interesting but nothing beats hard facts such as actual output of your crap. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html