On 3/31/19 2:14 PM, Jorge R. Frank wrote:
On 3/31/19 00:03, Phil Turmel wrote:
Consider not buying cheap drives when the time comes to replace. The
boot script will suit until then.
In my defense, I was young, stupid, and unsupervised when I built the
array. Hard to argue with the results. The system has been running
practically 24/7 since December 2008 and this is the first glitch I > couldn't fix by simply re-seating SATA cables and rebooting.
You've been extraordinarily lucky.
One thing I would like to confirm is where to call the SCT ERC script in
the boot process. The wiki wasn't clear on that point.
It's not clear because it varies so much from distro to distro and even
within distro versions. Basically, it should be in your distro's
version of rc.local, or even better, triggered by udev rules.
All of this is consistent with a controller issue knocking out those
two drives simultaneously. The correct solution is to use --assemble
--force with explicit device names (not using --scan).
You should use fsck to clean up any unavoidable fs corruption from
in-flight I/O before mounting.
Would you recommend explicitly including all four devices, since sdd and
sde have the same event count? Or just three, arbitrarily picking one of
sdd/sde to include, then adding a new fourth drive?
Use all four. That way, if there are any lurking UREs, the array will
fix itself (slowly, because of the long timeouts).
Due to the age of
the system and the fact that the motherboard SATA controller now has a
strike against it, my plan upon recovery is to immediately back up the
array and replace the entire system. So if the former would work on a
short-term basis, I'd be willing to try it.
Replacing the system is less important than replacing the drives. If at
all possible, move to raid6.
Thanks again,
JRF
You're welcome.
Phil