Re: Failed RAID5 array - recovery help

Adam Goryachev <mailinglists@xxxxxxxxxxxxxxxxxxxxxx> · Wed, 12 Sep 2018 09:46:26 +1000

On 12/09/18 06:37, Francois Goudal wrote:

On 07/09/18 01:14, Adam Goryachev wrote:
On 07/09/18 06:14, Francois Goudal wrote:
Hello,

I've been running a 5-disks RAID5 volume with an ext4 filesystem on 
a Synology NAS since 2012 without any problems, until, last week, 
bad things happened.

At first, my disk in slot 5 "failed". I'm putting quotation marks 
here because as I'll explain later, I later found out that the disk 
is actually in good shape, so it might have been a controller issue, 
who knows...

At this point, the array is degraded but still fully working. I 
don't do anything other than ordering another disk for replacement.

Couple days later, new disk gets delivered. I remove the failed disk 
from slot 5, put in the new disk and initiate the resync of the volume.

Of course, half way through, what had to happen happenned. Got URE 
on disk in Slot 1. Disk is marked failed and volume is also failed 
as a consequence of 2 disks missing.

Now, it's time to think about recovery, because I unfortunately do 
not have a very recent backup of the data (lesson learned, won't do 
this ever again).

At this point, I decide to freeze everything before trying anything 
stupid.

I took all 5 original disks from the NAS out and connected them to a 
linux machine and went through a very lengthy process of running 
ddrescue to image them all.

 - Slot 5 disk (the first one that failed) happens to read properly, 
no errors at all...

 - Slot 1 disk (the one who failed next with URE) has 2 consecutive 
sectors (1kb) at approx 60% of the volume that can't be read, all 
other data reads fine

 - Slots 2, 3 and 4 disks read fine

So, I now have full images of all 5 disks I can safely work on. They 
are on a LVM-based volume and I have a snapshot, so I can easily try 
and fail with bad mdadm commands and easily go back to original dumps.

My Events counter on disks looks like this:

root@lab:/# mdadm --examine /mnt/dump2/slot{1,2,3,4,5}.img | grep Event
         Events : 2357031
         Events : 2357038
         Events : 2357041
         Events : 2357044
         Events : 2354905

Disk 5 is way behind, which is normal since the array was kept 
running for a couple days after that disk failed.

Disks 1,2,3 and 4 are all pretty close. They are not exactly the 
same number, but I think this is because I didn't stop the raid 
volume before pulling the disks out, so each time a disk was pulled, 
the Array State in the superblock was updated on the remaining 
disks. My mistake here, but hopefully not going to be a big deal ?

So, my conclusion at this point is that I probably still have a 
consistent state with disks 1,2,3 and 4 (except that I have a known 
1kb of data that's corrupted, but shouldn't be a very big deal, 
those sectors may have not been used at all by the filesystem, and 
even if they did, this shouldn't prevent me from recovering most of 
my files, as long as I can reassemble the volume somehow).

I was thinking about trying something like mdadm --assemble 
--assume-clean --level=5 --raid-devices=5 /dev/md0 /dev/loop0 
/dev/loop1 /dev/loop2 /dev/loop3 missing

(with /dev/loop0-4 respectively pointing to my disks 1-4, and 
declaring disk 5 as missing)

Haven't tried this yet, would this be the right approach ? Any other 
suggestions are welcome.

Personally, I think this is the right "next step", but if you wanted 
to recover 100% of your data, then I'd follow this process (but I 
don't know all the precise magic commands... but perhaps more 
research and/or trial and error, and/or someone else will jump in 
with the details:
1) If you can identify the URE blocks, then you could use the disks 
2,3,4 and the original disk5 to recalculate the correct values for 
disk1, and write this into the image copy (or write this to the 
original disk1, which should either resolve the URE or remap to 
another physical sector and solve the URE.
2) Then you will need to research the timeout issue and URE's and 
your disks, and fix the timeout issue (assuming that is what caused 
the original problem with disk5, and potentially the problem with 
disk1 during the rebuild).
3) Then you can re-add the new disk5, and allow the resync to complete.
4) If possible, wipe the original disk5, and add to the array as a 
spare, or even better, convert to RAID6
5) Enable regular checks of the array so that you will detect URE's 
before they become a problem (during a rebuild)
6) Enjoy many more years of trouble free operation

Hope that helps, but it sounds like as far as data recovery goes, you 
are in an excellent position to recover everything.

Regards,
Adam

Hi,

So, after quite a bit of struggle, I finally managed to recover all my 
data :)
I continued on my proposed method, and it wasn't a completely smooth 
ride, but I eventually managed to do it.
For the record, the difficulties I faced:
 - my command above was incomplete, I had to specify a size, which 
wasn't obvious. The unit for this parameter is different than the 
units found in mdadm --examine output. I eventually found out that 
dividing by two the Used Dev Size from mdadm --examine was the right 
value for mdadm --assemble --size option.
 - also the chunk size had to be forced, my raid volume had a chunk 
size of 64 (as per /proc/mdstat) and newer versions of mdadm default 
to a bigger size
 - then I had to use an older version of mdam anyway, because I had a 
message like: "mdadm: /dev/sdb1 is smaller than given size. xxxK < 
yyyK + metadata". The page 
https://raid.wiki.kernel.org/index.php/RAID_Recovery even though 
marked obsolete was helpful
 - after all above, I was able to re-assemble a degraded volume (4 
disks out of 5), and I could see it contained a valid ext4 filesystem. 
Unfortunately I was unable to mount it, due to the kernel throwing 
"Number of reserved GDT blocks insanely large: 8189". Synology NASes 
seem to format their ext4 filesystems with a large number of reserved 
GDT blocks, and unfortunately, recent kernels have a limit and will 
refuse to mount a filesystem that exceeds this limit. I couldn't find 
any option to force the mount, so I had to recompile my kernel after 
commenting out the code that does this test. Maybe a force option 
would have been nice here, but well...

After all above, I was able to access my data and rsync it somewhere 
safe.

OK, great result ;)

I decided I would not go down the path of trying to recover those 2 
512-byte blocks. I already spent a lot of energy on this and I feel 
like I can accept to maybe have one file that is maybe corrupted. But 
thanks for the suggestion.

Regarding your item 2) it is clear to me that the problem with disk1 
during rebuild was due to bad sectors. I mean, ddrescue also failed to 
read sectors, so the rebuild just had to fail because of that (and it 
failed at about 60% which also corresponds to where those failed 
blocks are on that disk). So this disk goes to trash. The other ones 
too probably, but for a different reason (they are almost 10 years old 
and I'm changing my storage strategy, see below).

Sure, but equally, bad sectors that are read from will always be bad, 
but if you write to them, then the disk will be able to repair them (by 
re-writing), or else allocate two new sectors from its group of spares, 
but to MD they will look like they are magically fixed. In any case, I 
agree its time to replace your drives if they are 10 years old ;)
With regards to 5), how would you do this ? Do you mean that I should, 
on purpose, pull a disk from the array, then put it back and initiate 
a rebuild, every once in a while ?
Definitely not!!! The disk you pull would be marked as out-dated, and 
any issue on the remaining disk(s) would result in exactly the 
experience you are having now.
Or is there some magic mdadm command for this ?
Start here, but there is a lot more information out there:
https://raid.wiki.kernel.org/index.php/Scrubbing

At least, nothing that's exposed through the Synology DSM user 
interface, unfortunately. But I could always do it in command line I 
guess.

Some NAS would include this in their standard cron jobs/etc, but 
depending on the age of the NAS and kernel, it might be too old to 
include this feature.

There are some lessons learned here, and I have decided to rethink my 
storage strategy. Not going to do RAID5 anymore, rather do RAID10 
instead, with 4 bigger disks. That leaves one slot free in my NAS, 
which I'm going to use with a very large disk, that is as big as my 
whole RAID10 volume, and I will setup data replication between the 
RAID10 and that single disk. Not only that, I'll have another place 
where the data is also going to be synchronized, as an off-site backup.
Not a bad solution, in the past, I've done RAID10 + RAID1 with my last 
really big disk, and used write-mostly for the single drive to reduce 
the performance impact. RAID10 only protects you against a single drive 
loss (though you might be lucky and lose two drives and still be OK). 
With RAID10 + 1 you can lose any two drives, and if you are lucky, you 
can lose three and survive data loss.
There are other advantages to using the 5th drive for "backups" instead 
of RAID, but the disadvantage is that you will lose some data between 
the last backup and current, and/or potential to "miss" some data from 
the backup. Either option is valid though, just depends on your 
environment and risks/needs.

I wanted to thank you for taking the time to answer my question and 
offering help. And also thank everyone who contributed to the wiki 
pages on kernel.org which have been really helpful and prevented me 
from rushing into doing bad things which would have resulted in true 
data loss. 

The best move you made was imaging all the drives. From that one step, 
almost anything you did was recoverable ;)

Regards,
Adam

--
Adam Goryachev Website Managers www.websitemanagers.com.au
--
The information in this e-mail is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this e-mail by anyone else
is unauthorised. If you are not the intended recipient, any disclosure,
copying, distribution or any action taken or omitted to be taken in reliance
on it, is prohibited and may be unlawful. If you have received this message
in error, please notify us immediately. Please also destroy and delete the
message from your computer. Viruses - Any loss/damage incurred by receiving
this email is not the sender's responsibility.