Re: raid6 glorious failure - help (or at least a shoulder to cry on) needed

Eyal Lebedinsky <eyal@xxxxxxxxxxxxxx> · Thu, 5 Apr 2018 08:36:27 +1000

On 05/04/18 03:51, Wol's lists wrote:
On 04/04/18 15:55, John Lupa wrote:
Hi All,

Hi John,

I'm in a bit of trouble with a raid6 array and any feedback would be
appreciated...

Of course no one will be held responsible for what will happen to the array
as doing anything with the array will be ultimately only my decision.

It all looks good (inasmuch as a crashed array can look good :-)

So, long story short:

- raid 6, - 5 x 3 TB disks, Seagate Barracuda (ST3000DM001-1ER1 x 2,
ST3000DM008-2DM1 x 2, ST3000DM001-1CH1)
- I know (now, after reading raid.wiki.kernel.org) the HDD choice was not
very good and I have to deal with it now (I suspect that "scterc" missing
feature of the disks or the hardware failure of the LSI controller caused
all this mess)
- disks connections: mixed via motherboard SATA ports and LSI HBA
controller, maybe not the best idea to mix but this setup worked fine for
2-3 years now...
- CentOS release 6.9 (Final), ASRock H77 Pro4-M, 8GB RAM, LSI controller
(SAS 9217-8i Host Bus Adapter)
- another raid1 (Samsung SSD 840) with the OS still running with no
glitches, both disks connected via motherboard SATA

Not knowing what your LSI HBA controller is ...

(Please note that the /dev/sdX letters below may change as I have
added/removed other disks to clone the raid6 disks or changed their SATA
ports)

############################################################################
###################
1. suddenly array went offline (I have quite a few logs but I copied just
what I thought it would be helpful,
please let me know if the full log (~600k) would be better). It looks that I
may have a bit of filesystem errors too,
but hey don't discourage me - one problem at a time!

############################################################################
###################
2. since the failure I have done no writes on those disks

Good ...

############################################################################
###################
3. smartctl long and short tests show the disks are ok; I can provide the
output should you think it is useful.

Probably no need, that looks good too ...

As you've hopefully picked up, setting the kernel timeout to 180 will fix the problem with these drives being raid-unfriendly, at the cost of making the system chronically slow if it hits a problem ...

Nothing unfortunately can fix the bad rep the 3GB model has of dying terminally early ...

############################################################################
###################
4. the "mdadm --examine" output (I've put in some "<<<<" signs to timestamps
and event numbers):

/dev/sda:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd
            Name : storage00server:100  (local to host storage00server)
   Creation Time : Thu May  9 21:09:42 2013
      Raid Level : raid6
    Raid Devices : 5

  Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB)
      Array Size : 8790405120 (8383.18 GiB 9001.37 GB)
   Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=944 sectors
           State : clean
     Device UUID : 74882e49:8294ae56:1c6eafbe:2c9eb6ec

     Update Time : Fri Mar  9 11:33:32 2018
<<<<<<<<<<<<<<<<<<<<<<<
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : e0b8ef21 - correct
          Events : 2444205
<<<<<<<<<<<<<<<<<<<<<<<

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 0
    Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdb:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

/dev/sde:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd
            Name : storage00server:100  (local to host storage00server)
   Creation Time : Thu May  9 21:09:42 2013
      Raid Level : raid6
    Raid Devices : 5

  Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB)
      Array Size : 8790405120 (8383.18 GiB 9001.37 GB)
   Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=944 sectors
           State : clean
     Device UUID : 325fcaac:8195916b:8cb2871b:3f54f1c4

     Update Time : Fri Mar  9 11:33:32 2018
<<<<<<<<<<<<<<<<<<<<<<<
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 8e4ac163 - correct
          Events : 2444205
<<<<<<<<<<<<<<<<<<<<<<<

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 2
    Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdf:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd
            Name : storage00server:100  (local to host storage00server)
   Creation Time : Thu May  9 21:09:42 2013
      Raid Level : raid6
    Raid Devices : 5

  Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB)
      Array Size : 8790405120 (8383.18 GiB 9001.37 GB)
   Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=944 sectors
           State : clean
     Device UUID : fd3ccca5:2f0ec0af:1e1f64f8:be53ce86

     Update Time : Fri Mar  9 11:33:32 2018
<<<<<<<<<<<<<<<<<<<<<<<
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 6a3483eb - correct
          Events : 2444205
<<<<<<<<<<<<<<<<<<<<<<<

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 1
    Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdg:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd
            Name : storage00server:100  (local to host storage00server)
   Creation Time : Thu May  9 21:09:42 2013
      Raid Level : raid6
    Raid Devices : 5

  Avail Dev Size : 5860271024 (2794.39 GiB 3000.46 GB)
      Array Size : 8790405120 (8383.18 GiB 9001.37 GB)
   Used Dev Size : 5860270080 (2794.39 GiB 3000.46 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=944 sectors
           State : clean
     Device UUID : 3fe05e31:aea12f6f:30219c17:c858e069

     Update Time : Sat Mar 10 03:28:16 2018
<<<<<<<<<<<<<<<<<<<<<<<
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : b776a20c - correct
          Events : 2444333
<<<<<<<<<<<<<<<<<<<<<<<

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 3
    Array State : ...A. ('A' == active, '.' == missing, 'R' == replacing)

So one disk out of five is completely gone (sdb, from the raid6 array's
point of view?).
Then three of them sda, sde and sdf have the same number of events (2 444
205) and the same timestamp (Fri Mar  9 11:33:32 2018).
The last one, sdg, has a later timestamp (Sat Mar 10 03:28:16 2018) and a
higher number of events (2 444 333).

I'm guessing the LSI is an expansion board giving you more SATA ports - are the two "dodgy" drives (sdb and sdg) on that?

What's the output of fdisk on sdb? It could be something as simple as the partition table getting trashed. There was a case here a while back, where the user lost two drives from a 3-disk raid-5. Something had trashed the partition tables, and when they were re-created the array came back hunky-dory. Not saying that's the case here, but you never know ...

The other thing, run lsdrv and post that output. I can't particularly interpret it, but it'll tell others a lot ...

############################################################################
###################
5. the /dev/md127 is automatically recognized by the system at boot (output)
and brought to the state below.
It seems that it is trying to automatically assemble the /dev/md127 using
the disk with the latest timestamp.

Sounds about right ...

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [raid0]
md0 : active raid1 sdf1[3] sdg1[2]
       716736 blocks super 1.0 [2/2] [UU]

md4 : active raid0 sdi[1] sdh[0]
       3906766848 blocks super 1.2 512k chunks

md127 : active raid6 sda[5](F) sdb[7](F) sde[8] sdc[9](F)
       8790405120 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/1]
[___U_]

md1 : active raid1 sdg2[2] sdf2[1]
       116436864 blocks super 1.1 [2/2] [UU]
       bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

# mdadm --detail /dev/md127
/dev/md127:
         Version : 1.2
   Creation Time : Thu May  9 21:09:42 2013
      Raid Level : raid6
   Used Dev Size : -1
    Raid Devices : 5
   Total Devices : 1
     Persistence : Superblock is persistent

     Update Time : Sat Mar 10 03:28:16 2018
           State : active, FAILED, Not Started
  Active Devices : 1
Working Devices : 1
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 512K

            Name : storage00server:100  (local to host storage00server)
            UUID : 867b16d3:0a005ef1:3828705e:0ad31dcd
          Events : 2444333

     Number   Major   Minor   RaidDevice State
        0       0        0        0      removed
        2       0        0        2      removed
        4       0        0        4      removed
        8       8       80        3      active sync   /dev/sdf <<<<<<<<<<
(previously detected as sdg and having the greates number of events -
2444333)
        8       0        0        8      removed

############################################################################
###################
6. So having 5 devices in the raid6 I had the fancy idea of assembling the
array using only the three drives that have the same number of events and
timestamp but I've got this output:

# mdadm --verbose --assemble --readonly /dev/md13 /dev/sda /dev/sdf /dev/sdg
mdadm: looking for devices for /dev/md13
mdadm: Found some drive for an array that is already active:
/dev/md/storage00server:100
mdadm: giving up.

!!! ok, it looks like not a good idea, let's "mdadm --stop /dev/md127" and
then use its old name of md127:

# mdadm --verbose --assemble --readonly /dev/md127 /dev/sda /dev/sdf
/dev/sdg
mdadm: looking for devices for /dev/md127
mdadm: /dev/sda is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdf is identified as a member of /dev/md127, slot 2.
mdadm: /dev/sdg is identified as a member of /dev/md127, slot 1.
mdadm: added /dev/sdg to /dev/md127 as 1
mdadm: added /dev/sdf to /dev/md127 as 2
mdadm: no uptodate device for slot 3 of /dev/md127
mdadm: no uptodate device for slot 4 of /dev/md127
mdadm: added /dev/sda to /dev/md127 as 0
mdadm: /dev/md127 assembled from 3 drives - need all 5 to start it (use
--run to insist).

Should I insist?

Don't see why not. Just be warned, you will now have a 3-disk raid-0 - no redundancy WHATSOEVER. I'd run a READ-ONLY fsck over the file system to check it's all okay, then mount it READ-ONLY and look to see if it looks good. At this point, a backup would be a really good idea.

############################################################################
############################

I am now in the process of dd+bzip the physical disks before trying anything
potentially dangerous, this is quite time-consuming.
So before I do anything with the array I have to also figure out how to get
some disk space for these copies.

The approach here is that I would really need my data back, I have a partial
backup from 1-2 months ago but I have added new files that are quite
important.

Anyway to circle back to the beginning of the email - any ideas would be
appreciated, feel free to ask for more info if needed.

Once you've got this far, if you're happy your hardware is good (the drives certainly appear to be, dunno about the controllers), MAKE SURE THAT TIMEOUT SCRIPT IS SET TO RUN EVERY BOOT, and just add the two duff disks back.

I would, however, make sure I've blown away as much of any partition remnants

Any reason why --re-add'ing sdg (as is, not blowing away anything) is unsafe?
There is a bitmap so this should be the quickest way to get back some redundancy.
Then diagnose sdb and deal with it.

as I can from those disks first - dd a meg or so of zeroes at the start and end of the drive, recreate the partitions, and run "mdadm --wipe-superblock" (I think I remember that correctly ... :-). Preferably done on a different machine ... (there are various cases of drives being "added back" and old data on them causing havoc - that's why it pays to clean them).

Cheers,
Wol

Good luck John

--
Eyal Lebedinsky (eyal@xxxxxxxxxxxxxx)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html