Re: Help needed: RAID5 with two apparent simo drive failures

Phil Turmel <philip@xxxxxxxxxx> · Sun, 31 Mar 2019 01:03:21 -0400

Hi Jorge,

Very good report.

On 3/30/19 7:02 PM, Jorge R. Frank wrote:
Running Ubuntu 14.04, mdadm 3.2.5-5ubuntu4.4

Array is 4x1 TB drives, /dev/sd[bcde]1. /dev/sda is the boot drive and 
not part of the array.

Have not attempted any potentially destructive troubleshooting steps 
like mdadm --force, seeking advice first.

Ok.  Using --assemble --force with explicit devices is safe.  Just 
doesn't work is severe cases.

Have determined that the drives came with SCT ERC disabled by default. I 
ran the script on this page successfully but have not yet set it to run 
at every boot:

<https://raid.wiki.kernel.org/index.php/Timeout_Mismatch>

Consider not buying cheap drives when the time comes to replace.  The 
boot script will suit until then.

mdadm --stop /dev/md0 followed by mdadm --assemble --scan gave the 
following output:
mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.

dmesg showed errors from sdd and sde.

mdadm --examine showed 1066 events on sdb and sdc, 1055 events on both 
sdd and sde. If one drive had failed first I would have expected lower 
event count on just that one drive. Also, the array state is 
inconsistent in the mdadm --examine output. sdb and sdc show "AA..", sdd 
and sde show "AAAA".

mdadm, dmesg, lsdrv, and smartctl outputs are attached. The smartctl 
outputs were after fixing the timeout mismatch. The dmesg output is only 
the new messages from the mdadm --stop, mdadm --assemble commands.

All of this is consistent with a controller issue knocking out those two 
drives simultaneously.  The correct solution is to use --assemble 
--force with explicit device names (not using --scan).

You should use fsck to clean up any unavoidable fs corruption from 
in-flight I/O before mounting.

Phil