badblocks & raid

Michael Robinton <michael@bizsystems.com> · Fri, 8 Feb 2002 12:41:11 -0800 (PST)

> have a disk that seems to have bad sectors:
>
> hdd: dma_intr: status=0x71 { DriveReady DeviceFault SeekComplete
> Error } hdd: dma_intr: error=0x04 { DriveStatusError } hdd: DMA
> disabled ide1: reset: success hdd: set_geometry_intr: status=0x71 {
> DriveReady DeviceFault SeekComplete Error } hdd: set_geometry_intr:
> error=0x04 { DriveStatusError } ide1: reset: success hdd:
> set_geometry_intr: status=0x71 { DriveReady DeviceFault SeekComplete
> Error } hdd: set_geometry_intr: error=0x04 { DriveStatusError }
> end_request: I/O error, dev 16:41 (hdd), sector 56223488 hdd:
> recal_intr: status=0x71 { DriveReady DeviceFault SeekComplete Error
> }
> hdd: recal_intr: error=0x04 { DriveStatusError }
> ide1: reset: success
> ...
> hdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hdd: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=23844548,
> sector=23844480 ...
>
> and so on, for a number of sectors.
> This drive, hdd, has one partition, hdd1 that participates in a md0
> array. Let's assume I can't just get rid of the drive that has
> problems, I have to keep it and it has to stay in the array. If it
> wasn't part of the array, I could run a badblocks -w test, find the
> numbers of the failing sectors and feed them to mke2fs/e2fsck or
> whatever other utility the filesystem on hdd1 has for marking bad
> blocks, and the problem would (hopefully) end there.
>
> However, since hdd1 is part of the array, I suppose it's not
> possible to run badblocks on that single device and then somehow map
> the blocks on hdd1 to blocks on md0, especially since it's striping,
> raid-0, right? Is there some way to find out which sectors of md0
> are on this drive, so I can limit the range of sectors to run
> badblocks on? Running badblocks read+write on such a huge device
> with can take ages.
>
> If anyone has any other suggestions, they are also welcome :)
>
If you are running raid1, there is a solution -- kludgy, but it will
work

remove the bad drive from the array.

Take the second drive and mark it as a regular set or partitions
rather than raid and restart it as a stand alone file system. The
raid superblock is at the end of the partition so if you have
formatted as ext2, the partions will mount normally if the fstab,
kernel, etc.. are asked to mount the underlying partitions instead of
the raid device.

do a complete reformat of the bad raid device -- it should be mounted
with the other disk failed but really mounted as the real file system
/dev/hdx...etc...

If you have a rescue disk that can do the mounting of the old disks,
better yet, then you don't have to alter the fstab, etc...

use cpio to transfer the disk image from the old good disk to the
newly formatted BAD disk. I'd suggest doing the first directory
levels independently so you can avoid the /dev, /proc, /tmp and /mnt
directories. Create those by hand, copy /dev by hand using cp -a

cd /
file /targetdir | cpio -padm /mnt

I've done this many times without backup, though I don't recommend
it. If you screw up you're dead. Better to take a spare disk and sync
it to the remaining good one so you have a backup (faster easier) or
run a backup tape. If you choose to go the spare disk route, use it
instead of the original -- test it carefully to make sure the files
really transferred as you expected (memory error can eat your lunch).

Once the transfer is complete, remount the new BAD disk as the OS
file system and do a raid hot add of the old good disk. It will sync
with the bad blocks ignored.

If used this exact technique ONCE only, it did work fine. It was a while
ago and as I recall it did produce some errors in the area where the bad
blocks reside but no where else. The system has been running for some time
and I've encountered no problems with it. raid1 on PII with 2.4.17,
2 - 3.5 gig ide drives

Michael

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html