Fwd: Question / Request about timeouts of SATA harddisks [was:]devices get kicked from RAID about once a month

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Below is the message I posted on linux-scsi regarding the timeouts.
I guess my approach is a bit different from Bill's, but my experience at
work is (always getting 100% OK drives back from customers stating
they'd be broken) that the internal error-correction of disk drives
makes them fail RAIDs too often.  And that's because the read-commands
time out too early.

Stefan

-------- Original-Nachricht --------
Betreff: Question / Request about timeouts of SATA harddisks
Datum: Thu, 03 Jun 2010 08:32:45 +0200
Von: Stefan /*St0fF*/ Hübner  <stefan.huebner@xxxxxxxxxxxxxxxxxx>
Antwort an: st0ff@xxxxxx
An: linux-scsi@xxxxxxxxxxxxxxx

Dear list,

concerning RAIDs with Desktop class drives it'd be good to know if there
is any kernel-timeout-value which states, how long a diskdrive may take
to process a command.  If it doesn't respond in-time I've seen in my
logs and with many disks that the sg-eh becomes active resetting the
bus.  So somewhere there needs to be such a timeout.

The question is: can this timeout-value be found somewhere in the sysfs?
 If "yes" where?  If "no", can it be exported?

Suggestions for the maximum of this timeout-value:
I've read in multiple places that the internal error correction of
desktop-class drives can even take more than 2 minutes to complete.  But
another much bigger value can be taken directly out of the
IDENTIFY_DEVICE_STRUCTURE, word 89, which states how long a
SECURITY_ERASE_UNIT command takes approximately (in minutes-see
ATA8-ACS, table 32 for the exact meaning of the value).  A in some cases
even longer time can be found in word 90 (time estimate for enhanced
security erase unit, table 33).

I know for sure that this command should not be issued from linux - at
least not at this state of libata.  I've done it twice on two different
Samsung HD103UJ drives to "securely erase customer data".  While
executing the command libata made the PassThru command fail within a
minute and according to syslog it unsuccessfully tried to reset the
drive.  The results were (after letting the drives run for about the
indicated 3 hours): one drive broke completely, it was never again
recognized on any computer, the other drive was recognized though, but
did not operate properly anymore.

Any comments and hints are very welcome.  All the best,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux