mdadm resync causes stable system to crash every 2 or 3 hours

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



My file server is usually very stable.  The past week I had two mdadm
arrays that required recync operations.
* newly created raid6 array (14 x 16TB seagate exos)
* existing raid 6 array, after a reboot resync on hot spare (14 x 4TB
seagate barracuda)

During both resync operations (they ran one at a time) the system
would routinely experience a major error and require a hard reboot,
every two or three hours.  I saw several errors, such as:
* kernel watchdog soft lockups [md127_raid6:364]
* general protection faults (I have a few saved with the full exception stack)
* exceptions in iommu routines (again I have the full error with
exception stack saved)
* full system lockup

I doubt there is a bug in mdadm that caused this behavior.  But it was
very predictable and repeatable while the resync operations were in
progress.

How can I avoid these errors the next time I have an array in need of a resync?

OS: debian 11 bullseye
kernel: 5.10.0-8-amd64 #1 SMP Debian 5.10.46-4 (2021-08-03)
mdadm: v4.1 - 2018-10-01
sata HBA: 3 x LSI SAS 9201-16i
_____________
Ryan Patterson
May the wings of liberty never lose a feather.



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux