RE: Broken harddisk

"Guy" <bugzilla@xxxxxxxxxxxxxxxx> · Sat, 29 Jan 2005 11:19:11 -0500

For future reference:

Everyone should do a nightly disk test to prevent bad blocks from hiding
undetected.  smartd, badblocks or dd can be used.  Example:
dd if=/dev/sda of=/dev/null bs=64k

Just create a nice little script that emails you the output.  Put this
script in a nighty cron to run while the system is idle.

-----Original Message-----
From: linux-raid-owner@xxxxxxxxxxxxxxx
[mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of Gordon Henderson
Sent: Saturday, January 29, 2005 10:56 AM
To: T. Ermlich
Cc: linux-raid@xxxxxxxxxxxxxxx
Subject: Re: Broken harddisk

On Sat, 29 Jan 2005, T. Ermlich wrote:

> That's right: each harddisk is partitioned absolutly identically, like:
>      0 - 19456 - /dev/sda1 - extended partition
>      1 - 6528  - /dev/sda5 - /dev/md0
>   6529 - 9138  - /dev/sda6 - /dev/md1
>   9139 - 16970 - /dev/sda7 - /dev/md2
> 16971 - 19456 - /dev/sda8 - /dev/md3
> And after doing those partitionings I 'combined' them to act as raid1.

> I have two additional IDE drives in that system.
> /dev/hda contains some data, and is the boot drive, /dev/hdb contains
> some less important data.

Just as a point of note - if the boot disk goes down it will be harder to
recover the data... Consider making the boot disk mirrored too!

> >   mdadm --add /dev/md0 /dev/sda1
> >   mdadm --add /dev/md1 /dev/sda2
> >   mdadm --add /dev/md2 /dev/sda3
> >   mdadm --add /dev/md3 /dev/sda4
>
> Now some new trouble starts ...?
> 'mdadm --add /dev/md0 /dev/sda1' started just fine - but exactly at 50%
> it started giving tons of errors, like:

You should ve using:

  mdadm --add /dev/md0 /dev/sda5

> [quote]
> Jan 29 16:10:24 suse92 kernel: Additional sense: Unrecovered read error
> - auto reallocate failed
> Jan 29 16:10:24 suse92 kernel: end_request: I/O error, dev sdb, sector
> 52460420

The is a read error from /dev/sdb. What it's saying is that sdb has bad
sectors which can't be recoverd.

You have 2 bad drives in a RAID-1 - and thats really bad )-:

> Personalities : [raid1]
> md3 : active raid1 sdb8[1]
>        19960640 blocks [2/1] [_U]
>
> md2 : active raid1 sdb7[1]
>        62910400 blocks [2/1] [_U]
>
> md1 : active raid1 sdb6[1]
>        20964672 blocks [2/1] [_U]
>
> md0 : active raid1 sdb5[1] sda5[2]
>        52436032 blocks [2/1] [_U]
>        [==========>..........]  recovery = 50.0% (26230016/52436032)
> finish=121.7min speed=1050K/sec
> unused devices: <none>
> [/quote]
>
> Can I stop that process for /dev/md0, and start with /dev/md1 (just to
> compare if its a problem with that partition only, or an general problem
> (so that eg. the second drive has problens, too)?

Yes - just fail & remove the drive partition:

  mdadm --fail   /dev/md0 /dev/sda5
  mdadm --remove /dev/md0 /dev/sda5

At this point, I'd run a badblocks on the other partitions before doing
the resync:

  badblocks -s -c 256 /dev/sdb6
  badblocks -s -c 256 /dev/sdb7
  badblocks -s -c 256 /dev/sdb8

if these pass, you can do the hot-add, however, it looks like the sdb disk
is also faulty.

At this point, I'd be looking to replace both disks and restore from
backup, but if you can re-sync the other 3 partitions, then remove the
also-faulty sdb, and replace it with a new one, and you can re-sync the 3
good partitions, and you only have to restore the '5' partition (md0) from
backup.

You could try mkfs'ing the new partition sda5, mounting it, and copying
the data on md0 over to it - theres a chance the bad sectors on sdb lie
outside the filing system... This would save you having to restore from
backup, however, it then becomes trickier as you then have to re-create
the raid set on a new disk with a missing drive, and copy it again.

> btw: does mdadm also format the partitions?

No... You don't need to format/mkfs the partitions, as the raid resync
will take care of making it a mirror of the existing working disk.

Gordon
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html