Gregory Youngblood wrote:
In my way of thinking, and what I was referring to above, was using
those error conditions to identify drives to change before the reported
complete failures. Yes, that will mean changing drives before SMART
actually says there is a full failure, and you may have to fight to get
a drive replaced under warranty when you do so, but you are protecting
your data.
I actually find it surprisingly easy to get a disk replaced based on a
printed SMART report showing uncorrectable sectors or just very high
reallocated sector counts etc. Almost suspiciously easy. I would not be
at all surprised if the disk vendors are, at least for their 7200rpm
SATA disks, recording a "black mark" against the serial number, doing a
low level reformat and sending them back out as a new disk to another
customer. Some of the "new" disks I've received have lifetimes and logs
that suggest they might be such refurbs - much longer test logs than
most new drives for example, as well as earlier serial numbers than
others ordered at the same time. They're also much, much more likely to
be DOA or develop defects early.
I agree with you completely that waiting for SMART to actually indicate
a true failure is pointless due to the thresholds set by mfrs. But using
SMART for early warning signs still has value IMO.
I could not agree more. smartmontools is right up there with tools like
wireshark, mrt, and tcptraceroute in my most-vital toolbox, and it's
mostly because of its ability to examine the vendor attributes and kick
off scheduled self tests.
I've saved a great deal of dead-disk-replacement hassle by ensuring that
smartd is configured to run extended self tests on the disks in all the
machines I operate at least fortnightly, and short tests at least
weekly. Being able to plan ahead to swap a dying disk is very nice indeed.
--
Craig Ringer
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance