Hi Jorge,
Very good report.
On 3/30/19 7:02 PM, Jorge R. Frank wrote:
Running Ubuntu 14.04, mdadm 3.2.5-5ubuntu4.4
Array is 4x1 TB drives, /dev/sd[bcde]1. /dev/sda is the boot drive and
not part of the array.
Have not attempted any potentially destructive troubleshooting steps
like mdadm --force, seeking advice first.
Ok. Using --assemble --force with explicit devices is safe. Just
doesn't work is severe cases.
Have determined that the drives came with SCT ERC disabled by default. I
ran the script on this page successfully but have not yet set it to run
at every boot:
<https://raid.wiki.kernel.org/index.php/Timeout_Mismatch>
Consider not buying cheap drives when the time comes to replace. The
boot script will suit until then.
mdadm --stop /dev/md0 followed by mdadm --assemble --scan gave the
following output:
mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.
dmesg showed errors from sdd and sde.
mdadm --examine showed 1066 events on sdb and sdc, 1055 events on both
sdd and sde. If one drive had failed first I would have expected lower
event count on just that one drive. Also, the array state is
inconsistent in the mdadm --examine output. sdb and sdc show "AA..", sdd
and sde show "AAAA".
mdadm, dmesg, lsdrv, and smartctl outputs are attached. The smartctl
outputs were after fixing the timeout mismatch. The dmesg output is only
the new messages from the mdadm --stop, mdadm --assemble commands.
All of this is consistent with a controller issue knocking out those two
drives simultaneously. The correct solution is to use --assemble
--force with explicit device names (not using --scan).
You should use fsck to clean up any unavoidable fs corruption from
in-flight I/O before mounting.
Phil