Re: S.M.A.R.T

"Richard Karhuse" <rkarhuse@xxxxxxxxx> · Sat, 30 Aug 2008 04:57:10 -0400

On Sat, Aug 30, 2008 at 4:08 AM, Mag Gam <magawake@xxxxxxxxx> wrote:

At my physics lab we have 30 servers with 1TB disk packs. I am in need

of monitoring for disk failures. I have been reading about SMART and

it seems it can help. However, I am not sure what to look for if a

drive is about to fail. Any thoughts about this? Is anyone using this

method to predetermine disk failures?

Here are a few references from my archives w.r.t. SMART ...

Hope they help ...

   -rak-

====

http://hardware.slashdot.org/hardware/07/02/18/0420247.shtml

			Google Releases Paper on Disk Reliability
"The Google engineers just published a paper on Failure Trends in a Large Disk Drive Population.
Based on a study of 100,000 disk drives over 5 years they find some
interesting stuff. To quote from the abstract: 'Our analysis identifies
several parameters from the drive's self monitoring facility (SMART)
that correlate highly with failures. Despite this high correlation, we
conclude that models based on SMART parameters alone are unlikely to be
useful for predicting individual drive failures. Surprisingly, we found
that temperature and activity levels were much less correlated with
drive failures than previously reported.'"

http://hardware.slashdot.org/hardware/07/02/21/004233.shtml

			Everything You Know About Disks Is Wrong
"Google's wasn't the best storage paper at FAST '07.
Another, more provocative paper looking at real-world results from
100,000 disk drives got the 'Best Paper' award. Bianca Schroeder, of
CMU's Parallel Data Lab, submitted Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?
The paper crushes a number of (what we now know to be) myths about
disks such as vendor MTBF validity, 'consumer' vs. 'enterprise' drive
reliability (spoiler: no difference), and RAID 5 assumptions.
StorageMojo has a good summary of the paper's key points."

http://www.linuxjournal.com/article/6983?from=50&comments_per_page=50

Monitoring Hard Disks with SMART

    By Bruce Allen on Thu, 2004-01-01 02:00.
    SysAdmin

    One
of your hard disks might be trying to tell you it's not long for this
world. Install software that lets you know when to replace it.

It's a given that all disks eventually die, and it's easy to see why. The platters in a modern disk drive
rotate more than a hundred times per second, maintaining submicron tolerances between the disk heads and the
magnetic media that store data. Often they run 24/7 in dusty, overheated environments, thrashing on heavily
loaded or poorly managed machines. So, it's not surprising that experienced users are all too familiar with
the symptoms of a dying disk. Strange things start happening. Inscrutable kernel error messages cover the
console and then the system becomes unstable and locks up. Often, entire days are lost repeating recent work,
re-installing the OS and trying to recover data. Even if you have a recent backup, sudden disk failure is a
minor catastrophe.
http://smartmontools.sourceforge.net/

smartmontools Home Page

Welcome! This is the home page for the smartmontools package.

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos