RE: badblocks & raid

Alvin Oga <aoga@Maggie.Linux-Consulting.com> · Fri, 8 Feb 2002 13:38:55 -0800 (PST)

hi ya cajoline

-- try replacing the "bad cable"...

-- try blowing air on oyur hot disks...

-- your disk is about to die or did die...

run the disks at ata-33 instead of ata-100
and see if its any better
	hdparm -X...

-- boot into single user... add a new disks...
	copy the bad disk onto the new disk
	( do NOT use dd... you'd just copy the bad info too )

	- if you lose some files... oh well.. hope you 
	have backups ... or start using disk utilities to recover
	file segments and manually patch it back together 

c ya
alvin
http://www.Linux-Backup.net  .. free scripts/methodologies ...

On Fri, 8 Feb 2002, Cajoline wrote:

> Thanks for the suggestions but the array is raid-0 and that can't change
> anymore :)
> 
> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> > owner@vger.kernel.org] On Behalf Of Michael Robinton
> > Sent: Friday, February 08, 2002 10:41 PM
> > To: linux-raid@vger.kernel.org
> > Subject: badblocks & raid
> > 
> > > have a disk that seems to have bad sectors:
> > >
> > > hdd: dma_intr: status=0x71 { DriveReady DeviceFault SeekComplete
> > > Error } hdd: dma_intr: error=0x04 { DriveStatusError } hdd: DMA
> > > disabled ide1: reset: success hdd: set_geometry_intr: status=0x71 {
> > > DriveReady DeviceFault SeekComplete Error } hdd: set_geometry_intr:
> > > error=0x04 { DriveStatusError } ide1: reset: success hdd:
> > > set_geometry_intr: status=0x71 { DriveReady DeviceFault SeekComplete
> > > Error } hdd: set_geometry_intr: error=0x04 { DriveStatusError }
> > > end_request: I/O error, dev 16:41 (hdd), sector 56223488 hdd:
> > > recal_intr: status=0x71 { DriveReady DeviceFault SeekComplete Error
> > > }
> > > hdd: recal_intr: error=0x04 { DriveStatusError }
> > > ide1: reset: success
> > > ...
> > > hdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> > > hdd: dma_intr: error=0x10 { SectorIdNotFound }, LBAsect=23844548,
> > > sector=23844480 ...
> > >
> > > and so on, for a number of sectors.
> > > This drive, hdd, has one partition, hdd1 that participates in a md0
> > > array. Let's assume I can't just get rid of the drive that has
> > > problems, I have to keep it and it has to stay in the array. If it
> > > wasn't part of the array, I could run a badblocks -w test, find the
> > > numbers of the failing sectors and feed them to mke2fs/e2fsck or
> > > whatever other utility the filesystem on hdd1 has for marking bad
> > > blocks, and the problem would (hopefully) end there.
> > >
> > > However, since hdd1 is part of the array, I suppose it's not
> > > possible to run badblocks on that single device and then somehow map
> > > the blocks on hdd1 to blocks on md0, especially since it's striping,
> > > raid-0, right? Is there some way to find out which sectors of md0
> > > are on this drive, so I can limit the range of sectors to run
> > > badblocks on? Running badblocks read+write on such a huge device
> > > with can take ages.
> > >
> > > If anyone has any other suggestions, they are also welcome :)
> > >
> > If you are running raid1, there is a solution -- kludgy, but it will
> > work
> > 
> > remove the bad drive from the array.
> > 
> > Take the second drive and mark it as a regular set or partitions
> > rather than raid and restart it as a stand alone file system. The
> > raid superblock is at the end of the partition so if you have
> > formatted as ext2, the partions will mount normally if the fstab,
> > kernel, etc.. are asked to mount the underlying partitions instead of
> > the raid device.
> > 
> > do a complete reformat of the bad raid device -- it should be mounted
> > with the other disk failed but really mounted as the real file system
> > /dev/hdx...etc...
> > 
> > If you have a rescue disk that can do the mounting of the old disks,
> > better yet, then you don't have to alter the fstab, etc...
> > 
> > use cpio to transfer the disk image from the old good disk to the
> > newly formatted BAD disk. I'd suggest doing the first directory
> > levels independently so you can avoid the /dev, /proc, /tmp and /mnt
> > directories. Create those by hand, copy /dev by hand using cp -a
> > 
> > cd /
> > file /targetdir | cpio -padm /mnt
> > 
> > I've done this many times without backup, though I don't recommend
> > it. If you screw up you're dead. Better to take a spare disk and sync
> > it to the remaining good one so you have a backup (faster easier) or
> > run a backup tape. If you choose to go the spare disk route, use it
> > instead of the original -- test it carefully to make sure the files
> > really transferred as you expected (memory error can eat your lunch).
> > 
> > Once the transfer is complete, remount the new BAD disk as the OS
> > file system and do a raid hot add of the old good disk. It will sync
> > with the bad blocks ignored.
> > 
> > If used this exact technique ONCE only, it did work fine. It was a
> while
> > ago and as I recall it did produce some errors in the area where the
> bad
> > blocks reside but no where else. The system has been running for some
> time
> > and I've encountered no problems with it. raid1 on PII with 2.4.17,
> > 2 - 3.5 gig ide drives
> > 
> > Michael
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html