Re: Bad Magic Number in Superblock - Any trick for Arch or for new kernels?

Mauro Santos <registo.mailling@xxxxxxxxx> · Thu, 10 Jun 2010 14:46:21 +0100

On 06/10/2010 02:00 AM, David C. Rankin wrote:
> 	I do like badblocks. It saved my bacon once before. Rather than doing the
> badblock recovery (since I have the data), what I think I'll do is search a bit
> more for the 'fdisk -l' info for the drive. If I find it, I'll try recreating
> the partitions and see what is left with the drive. If not, then I'll just add
> the drive to the pile. Eventually I'll do some type of chronological art exhibit
> with drives. Everything from 8 'Meg' MFMRLL drives to the new Seagate 500-750G
> drives that drop like flies now for some reason :p
> 

I guess that you can't recover much more from the drive as it is just by
trying to read from it (unless you get hold of some advanced tool to
make some sense of the whole drive).

That problem may not be caused by a drive failure but a combination of
factors, you said that this particular disk has been running for a few
years without problems and there is no indication of failure by the
smart attributes (I read that smart catches only about 2/3 of failures).

In my experience, power supplies go bad after 2 or 3 year or continuous
use if they are consumer grade hardware, so a bad power supply coupled
with the worst case for the hard disk can lead to problems, that is why
I have suggested badblocks to look for problems while keeping and eye on
the smart attributes. Also you may have an hardware failure somewhere
else, the motherboard or the hardware connected directly to the disk are
good candidates (as much as anything else actually if the system is 5 or
6 years old).

>From my experience only, I find it quite hard to know when a disk is
about to fail. Currently I am trying to figure out if an hard disk in a
machine I manage is about to fail or not (3'5 drive), smart says it is,
badblocks can't find anything wrong with the drive (even after 2 full
write passes) but one of the smart attributes, the one that says
failing_now increases by one with each full read cycle, smart attributes
do not report any reallocated sectors. This is a new drive (6 months
old, give or take) and the other drives assembled in the machine have
exactly the same usage and do not show any signs of trouble (the serial
numbers of the drives are all very close, almost sequential, all from
the same manufacturer).

I have had some trouble with a drive from the same manufacturer before
(2'5 drive), but things seem to go smoothly after I did just one 'dd
if=/dev/zero of=/dev/sd?' and then read it back, no smart attribute said
the drive was failing that time, so it might be just a bad coincidence.

As far as I can see, you have done the best thing you could have done,
which is keep backups of the important data, now all you can do is try
to decide if that drive can still be used and trust it a bit less (put
it in a raid array that can tolerate failures). Unless the drive fails
terribly with no margin for doubt it is hard to say, from the users
point of view, if it is really failing or not.

-- 
Mauro Santos