All, Comments in-line. --- Larry McVoy <lm@xxxxxxxxxxxx> wrote: > The one thing I'd add to Joseph's good advice is that when I see stuff like > this (which I do, I manage a lot of Linux boxes) I tend to start swapping > things. Put the drive in a known good system with a known good cable on > the cable by itself and then see if you get errors. If you don't get > errors in that situation it is likely your drive is fine and you have > some bad hardware elsewhere. > > Hardware debugging is basically swapping parts until you find the guilty > party. Thanks to both you and Joseph for making me think about things that I simply wouldn't have (or, at least not without first fixing something that wasn't broke). I would have immediately suspected the hard drive, not cables or other hardware. But, I guess that comes with experience, so thanks for sharing. The first thing I'll do is make sure that the cables are secure, and swapping cables as a quick test is easy enough to do. The controller is integrated into the MB, so that would be more problematical :) > On Sat, Jan 01, 2005 at 01:28:39PM -0600, Joseph D. Wagner wrote: > > > Getting errors similar to: > > > > > > Dec 31 20:44:30 mybox kernel: hdb: dma_intr: status=0x51 { DriveReady > > > SeekComplete Error } > > > Dec 31 20:44:30 mybox kernel: hdb: dma_intr: error=0x40 { > > > UncorrectableError }, > > > LBAsect=163423, high=0, low=163423, sector=163360 > > > Dec 31 20:44:30 mybox kernel: end_request: I/O error, dev 03:41 (hdb), > > > sector > > > 163360 > > > > This may not be the disk; it could also be the controller. I've seen it go > both ways. Any problems on hda? No problems on hda. But, if it's the controller, that's built into the MB, so that wouldn't be good. I didn't just get DMA-type errors, there are others, like the one below. Can't say that this is a complete list, though: Dec 29 16:40:40 mybox kernel: hdb: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } Dec 29 16:40:40 mybox kernel: hdb: read_intr: error=0x40 { UncorrectableError }, LBAsect=163423, high=0, low=163423, sector=163360 Dec 29 16:40:40 mybox kernel: end_request: I/O error, dev 03:41 (hdb), sector 163360 > > Try adding ide=nodma to the kernel parameters. If the problem goes away, > the problem is in the kernel driver for the controller or motherboard > chipset. Excellent sugguestion. Will give that a try. > > > When I rebooted, the system threw me into a shell, to get me to "fix" > > > things. So, I did an e2fsck -c -v /dev/hdb1 to attempt to fix things. > > > The badblocks checking took 20 hours (it's a 200GB disk). Then I went > > > through the question/answer session, hoping to get through the > problems... > > > > A better way to go about this is booting off the rescue CD and doing the > e2fsck scan there. Otherwise, there could be leftover problems from running > the scan off of the partition you are scanning. Ah, now that advice is somthing to put in my back pocket to remember. Never gave that a thought, since it "booted enough" to get me to a shell prompt to run e2fsck. Guess I wasn't forced to think "Rescue CD". > > > Some questions: > > > > Best advice to all 3 questions: get some sort of disk imaging software. > > > > The disk imaging software may copy the bad sectors (i.e. sectors marked bad > now may also be marked bad on the new drive), but you can force e2fsck to > rescan bad sectors. > > [SNIP] Thanks a million for all of the advice. I really appreciate it! About to go try these suggestions. Steve __________________________________ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo _______________________________________________ Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users