Re: Two raid5 arrays are inactive and have changed UUIDs

"Wol's lists" <antlists@xxxxxxxxxxxxxxx> · Sun, 5 Jan 2020 22:04:34 +0000

On 05/01/2020 20:06, William Morgan wrote:
Hello,

I'm new here and likely don't understand the etiquette. Please be
patient with me.

You're fine :-) Beautiful problem report, I have to say ...

I have two raid 5 arrays connected through an LSI 9601-16e (SAS2116)
card. I also have a few other single drives connected to the SAS card.
I was mounting both arrays through fstab using the original UUIDs of
the arrays. The system had been working great, remounting both arrays
on boot, etc. until yesterday when I shut the system off to remove one
of the single drives.
I didn't touch the raid array drives at all, but when I rebooted the
system, neither raid array mounted successfully. When I checked their
status, I noticed both arrays had changed to "inactive", and further
investigation showed that the UUIDs of both arrays had changed.
I started investigating using the troubleshooting page of the Linux
raid wiki. I tried to reassemble (no --force however) but it wasn't
successful. Here is a summary of what I noticed:

Smart data seems OK for all drives.I found some reports of bad blocks,
non-identical event counts, and some missing array members.

One array looks pretty good, the other one less so ...

md0 consists of 4x 8TB drives:

role drive events state
  0    sdc   10080  A.AA (bad blocks reported on this drive)
  1    sdd   10070  AAAA
  2    sde   10070  AAAA
  3    sdf   10080  A.AA (bad blocks reported on this drive)

This array looks good. I'm wondering whether the new UUID is something 
to do with the fact that it thinks it's a raid0. I'm sure I've seen this 
before, and it's not anything to worry about. Plus all your event counts 
are very close. My main concern is that two drives have one event count, 
and two have the other, which means that a little data loss is a 
distinct possibility.

Look at the page on overlays, if you've got a spare disk you can put 
overlay files on, then do a force-assemble, and everything will probably 
be (almost) fine. Do a couple of fscks until it's clean, check 
everything's okay, and if it is you can do a force-assemble on the array 
directly. So this array is pretty good.

md1 consists of 4x 4TB drives:

role drive events state
  0    sdj    5948  AAAA
  1    sdk   38643  .AAA
  2    sdl   38643  .AAA
  3    sdm   38643  .AAA

This array *should* be easy to recover. Again, use overlays, and 
force-assemble sdk, sdl, and sdm. DO NOT include sdj - this was ejected 
from the array a long time ago, and including it will seriously mess up 
your array. This means you've actually been running a 3-disk raid-0 for 
quite a while, so provided nothing more goes wrong, you'll have a 
perfect recovery, but any trouble and your data is toast. Is there any 
way you can ddrescue these three drives before attempting a recovery?

If it does assemble fine with overlays, then assemble the array with 
those three drives, then re-add sdj. This is where the danger lies - sdj 
will have to be rebuilt and that will place the other three drives in 
danger.

These are the things that stand out to me, but there may be other
issues I've overlooked. I have included the full output of the
troubleshooting commands below. I don't understand why the UUIDs would
have changed, but even after mkconf created a new mdadm.conf file, the
arrays would not assemble or mount. And I don't know how to fix the
situation without losing data. Please let me know how to proceed.

Thanks,
Bill

Hopefully some more experienced posters will chime in and add more info, 
but if you're happy using overlays, then you can check out my advice 
safely and if it works recover your arrays.

Cheers,
Wol