Re: mdadm resync causes stable system to crash every 2 or 3 hours

Ryan Patterson <ryan.goat@xxxxxxxxx> · Tue, 7 Sep 2021 18:38:46 -0400

On Tue, Sep 7, 2021 at 1:01 AM Wols Lists <antlists@xxxxxxxxxxxxxxx> wrote:
>
> On 07/09/21 01:44, Ryan Patterson wrote:
> > My file server is usually very stable.  The past week I had two mdadm
> > arrays that required recync operations.
> > * newly created raid6 array (14 x 16TB seagate exos)
> > * existing raid 6 array, after a reboot resync on hot spare (14 x 4TB
> > seagate barracuda)
>
> Aaarghhh
>
> See
> https://raid.wiki.kernel.org/index.php/Linux_Raid
>
> And especially
> https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
>
> That might not be your problem, but it's the very first thing you should
> address!
>
> Cheers,
> Wol

Thanks for the links.  I'm confident my issue is not Timeout Mismatch
related.  I've experienced that before.  It manifested with different
symptoms.  ie. mdadm would think a disk had failed and remove it from
the array.  But the system/OS stayed stable the whole time.

The seagate exos drives support SCT Error Recovery Control and it is
set correctly.  The barracuda drives do not support it, but I've used
these drives for six or seven years without issue.

-Ryan