Re: read errors with md RAID5 array

Andreas Klauer <Andreas.Klauer@xxxxxxxxxxxxxx> · Tue, 16 Aug 2016 14:27:49 +0200

On Tue, Aug 16, 2016 at 12:40:45PM +0100, Tim Small wrote:
> I didn't know about the bad block functionality in md.

I don't know how it's supposed to work either. I disable it everywhere.
(the option was --update=no-bbl but if I remember correctly it will 
accept that only if the bbl is empty)

I don't want arrays to have bad blocks. I don't want disks with bad blocks 
to be left in the array. I don't trust disks that develop defects or lose 
data so the only choice for me is to replace it with a new one.

Silently ignoring disk errors, silently fixing errors in the background, 
keeping bad disks around, in my point of view this will only cause much 
more trouble later on.

I want to be notified about any and all problems md encounters so I can 
decide what to do... unfortunately not many people seem to share this 
view and the "read errors are normal" faction seems to be growing...

Identical bad blocks on multiple devices should be the reason why your 
md is reporting I/O layers; those blocks are already marked bad by md, 
it does not even try to read them from the disks.

The last time I encountered these I ended up editing metadata 
or doing a (dangerous) re-create since I found no other way to 
get rid of them.

> In the meantime I'm trying to work out what data (if any) is now
> inaccessible.  This is made slightly more interesting because this array
> has 'bcache' sitting in front of it, so I might have good data in the
> cache on the SSD which is marked bad/inaccessible on the raid5 md device.

md won't be able to use that to repair by itself. Does bcache have some 
recovery mode that makes it dump back everything that is cached to disk? 
This comes with its own dangers, if the cache is wrong or other bugs...

Usually for such dangerous experiments you would use an overlay 
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
but I'm not sure how well that plays together with bcache either.

If you want to go with re-create in your case it would be something like

mdadm --create /dev/md42 --assume-clean \
    --metadata=1.2 --data-offset=128M --level=5 --chunk=512 --layout=ls \
    --raid-devices=3 /dev/overlay/sd{a,c,d}2

You have to specify all varaibles because mdadm defaults change over time.

Then --stop and --assemble with --update=no-bbl before the horrors repeat...

Mount and verify files for correctness (files larger than disks*chunksize).

Then --add a fourth drive and --replace the one you said has bad sectors 
according to SMART. Book a flight to Olympics in Rio and win a gold medal 
in hard disk long-cast throwing.

Once your RAID is running with three drives that are fully operational 
you can do your RAID6 or whatever.

If you don't have a backup, make one before doing anything else, 
as long as you still have somewhat access to your stuff.

Regards
Andreas Klauer
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html