Re: Catastrophic disk failure, where was smartd?

Roger Heflin <rogerheflin@xxxxxxxxx> · Wed, 26 Mar 2008 13:28:01 -0500

Bruno Wolff III wrote:
On Wed, Mar 26, 2008 at 08:35:49 -0500,
  "David G. Mackay" <mackay_d@xxxxxxxxxxxxx> wrote:
Shouldn't there have been some indication of problems prior to the
failure?

Only if you are lucky. Someone at Google published some information about
smart around a year ago. In cases where catastrophic failures occur, for a high
percentage there is no warning from smart.

The big issue is that most of the smart implementations don't scan the disk for 
bad blocks, and in my experience several years ago with a 1000+ disks in 
services was that the #1 failure was bad blocks, and smart did little to catch 
that.    The #2 failure was failure to spin up at all, but this seemed to be 
confined to certain batches.

One thing that I would do was do a simple "dd if=/dev/sdx of=/dev/null bs=1M" on 
all of my disks maybe 1x per week or 1x per month to scan it yourself, if the 
disk detects a sector getting too many errors (still correctable with the extra 
bits they have) they will move the data from the bad sector to a spare, and mark 
the bad sector bad, and I believe smart counts when this has been done.

                               Roger

--
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list