Re: Help / advice RAID5 Inactive

jonathan.p.milton@xxxxxxxxx · Thu, 02 Aug 2018 20:11:14 +0100

Hi Wol,

Thanks for your reply. I run BackupPC from a separate host so have good backups, so I'm not sweating too much !

I think you're right about drive 2 being bumped a while ago. That's would make sense with the counts. My bad having no error reporting enabled to alert me. Very disappointed though, these are Samsung drives and only 2 years old.

Given I have backups I went for the --force option and am happy to report it all went smoothly.

I am not seeing any evidence of rebuild, which is a surprise.
|  # cat /proc/mdstat 
|  Personalities : [raid6] [raid5] [raid4] 
|  md0 : active raid5 sda1[0] sdd1[3] sdc1[1]
|        3906764800 blocks super 1.2 level 5, 512k chunk, algorithm 2   [3/3] [UUU]
|     bitmap: 0/15 pages [0KB], 65536KB chunk
| unused devices: <none>

The raw device was encrypted. No problem with luksOpen

Now running xfs_repair on opne of the logical volumes. Looks like I have some data loss but it is minor. Fortunately server has been sitting idle for a couple of weeks due to vacation.

What you think about there being no rebuild?  

Cheers

Jonathan

On Thu, 2018-08-02 at 19:54 +0100, Wols Lists wrote:
> On 02/08/18 10:13, Jonathan Milton wrote:
> > Hi,
> > 
> > Overnight my server had problems with its RAID5 (xfs corrupt inodes), on
> > reboot the raid comes up inactive.
> > 
> > * Smarttools suggest disks (3x2TB) are healthy. I have powered down and
> > checked all the SATA leads are still plugged correctly.
> > 
> > * MDADM is unable to assembled to raid from 1 drive:
> > # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdc1 /dev/sdd1
> > mdadm: /dev/md0 assembled from 1 drive - not enough to start the array.
> > 
> > * Event counts are well off on one drive ( 290391/182871/290391)
> 
> Not good.
> > 
> > * SCT Error Recovery Control was disabled on all drives prior to this
> > failure but I have since modified the boot scripts to set to 7s as per
> > the wiki (no improvement)
> > 
> > I am considering whether to try --force and would like advice from
> > experts first
> > 
> 
> NOT WITHOUT A BACKUP!
> > 
> > Thanks in advance
> > 
> 
> That "only one drive" bothers me. Have you got any spare drives? Have
> you any spare SATA ports to upgrade to raid-6?
> 
> I'd ddrescue the two drives with the highest count (is that sda and
> sdd?), then force assemble the copies. That stands a good chance of
> succeeding. If that works, you can add back the third drive to recover
> your raid-5 - keeping the original two as a temporary backup.
> 
> If you can't get spare drives, overlay the two good drives then see if a
> force gets you a working array. If it does, then you can try it without
> the overlay, but not having a backup increases the risk ...
> 
> Then add one of the original drives back to convert to raid-6.
> 
> The event counts make me suspect the middle drive got booted long ago
> for some reason, then you've had a hiccup that booted a second drive.
> Quite likely if you didn't have ERC enabled. So it does look like an
> easy fix but because you've effectively got a broken raid-0 at present,
> the risk to your data from any further problem is HIGH. Read
> 
> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
> 
> 
> If you don't have any spare SATA ports, go and buy something like
> 
> https://www.amazon.co.uk/dp/B00952N2DQ/ref=twister_B01DUJJZ8U?_encoding=UTF8&th=1
> 
> You want a card with one SATA *and* one eSATA - beware - I think most of
> these have a jumper to switch between SATA *or* eSATA so you'll want a
> card that claims two of each - it will only actually drive two sata
> devices so configure one port for SATA for your raid-6, and one for
> eSATA so you can temporarily add external disks ...
> 
> https://www.amazon.co.uk/iDsonix-SuperSpeed-Docking-Station-Free-Black/dp/B00L3W0F40/ref=sr_1_1?ie=UTF8&qid=1529780418&sr=8-1&keywords=eSATA%2Bdisk%2Bdocking%2Bstation&th=1
> 
> Not sure whether you can connect this with an eSATA port-multiplier
> cable - do NOT run raid over the USB connection !!!
> 
> Cheers,
> Wol

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html