Re: Bad Magic Number in Superblock - Any trick for Arch or for new kernels?

Mauro Santos <registo.mailling@xxxxxxxxx> · Thu, 10 Jun 2010 19:16:01 +0100

On 06/10/2010 05:06 PM, David C. Rankin wrote:
> 	Your experience sounds exactly like mine over the past year. I have had 4
> Seagate drives supposedly "go bad" after 13-14 months use (1-2 months after
> warranty runs out). The problem is always the same - smart says there is a
> badblock problem and it logs the time/date of the error. Subsequent passes with
> smartctl -t long shows no additional problem and the drives always 'PASS'.

I didn't want to name the manufacturer because I think it is not fair,
failure due to normal wear is acceptable and expected (it seems to be
your case), it is also normal to see some drives fail early after
starting to work, this follows what we call here the bathtub curve,
higher failure rates at the start of the life, then failure rates
decrease significantly and then rise again at the end of life.

As a side note, here in Europe the warranty is 24 months (at least where
I live) so I doubt the manufacturer would make drives that would last
less than that, besides some manufacturers are offering/advertising 3 or
5 year warranty in their websites, so I guess they must be quite sure
their drives are reliable enough, you may want to look at that too and
see if you are illegible for a free replacement from the manufacturer
itself.

So far I have only seen consumer grade drives being used in machines
that are working 24/7, that is clearly a mistake but it's the cheapest
option and I guess that most of the times it works fine, hence it is
hard to explain why should more money be spent on server grade hardware.

Like I said before, the problem may be caused or aggravated by some
other component. Even with my limited experience I've seen some weird
problems caused by components that would not seem liable at first
glance. The latest trend from my limited experience seems to be power
supply failure, if not complete failure, at least not conforming to
specs and causing instability.

The trend seems to be to supply most components from 12V to reduce the
current flowing from the power supply to the component being supplied
(the supply voltage keeps decreasing with the latest technology nodes),
but hard disks (the 3'5 ones at least) still rely on 12v to make the
platters spin.

If you have to reduce the voltage from, lets say, 12V to 1V or 3.3V or
whatever close to that, you have lots of working margin, but because the
hard disk still requires 12V+-10% (it is what the atx spec requires) if
the 12V line goes out of spec, which it can go if the power supply is
going bad, then the hard disk may not be able to work properly while
everything else will still probably work happily.

All this to say, you may not have a bad drive on your hands, it may be
just an unfortunate coincidence, if you really have a backup of all the
data, try to write to the drive while it is connected to a "good" power
supply and the problems may be gone (happened to me before with a 2'5
hard disk), however it is a good opportunity to try to recover some data
from that drive, just to learn some tricks for the future before you
write to it and wonder what has caused that problem.

-- 
Mauro Santos