Re: smart short test crashes software raid array?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 9 Mar 2019 01:09:18 -0500
Kent Dorfman <kent.dorfman766@xxxxxxxxx> wrote:

> On two occasions I have experienced two of four slices in a four disk
> array going offline in the wee hours of the morning while doing heavy
> batch processing and concurrently running smart short tests on the
> drives via crontab.
> 
> My crontab consecutively starts the short test on all drives, and
> since it backgrounds them, they all are running concurrently.
> 
> I suspect that the linux implementation of software raid is not able
> to handle the heavy disk IO while all the drives are also doing their
> smart test.
> 
> The first time this happened the first two stripes went offline, and
> the second time the other two went offline.  Luckily on both occasions
> I was able to reassemble the array (eight hour process) and did not
> lose any data.

It will be difficult for anyone to guess what happened without a corresponding
complete dmesg output to define what "crashes" or "went offline" meant
exactly. And also what are the drive models. For instance SMR-based drives are
known to be having a hard time if you try to put a ton of concurrent load on
them.

SMART test by itself shouldn't cause issues, as it's a low priority IO for the
drive, so in theory the host shouldn't even be aware that it is going on.

-- 
With respect,
Roman



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux