On Wed, Jul 11, 2007 at 05:18:35PM -0600, Stephen John Smoogen alleged: > On 7/11/07, Eduardo Grosclaude <eduardo.grosclaude@xxxxxxxxx> wrote: > >Out of the blue, dmesg on my HP Proliant w/ a SCSI disk gives loads of > >messages like this one: > > > > EXT3-fs error (device dm-0) in start_transaction: Journal has aborted > > > > Then the root fs goes read-only, so little else can be done on the > > machine. > >LVM locks up. At restart, fs needs a reboot to recover after fsck. The host > >starts up ok, then I am given some more minutes before the problem > >reappears. This is stock CentOS 4.4, never have gotten to update it because > >of this very same problem. > > > > System logs say SCSI I/O error, but SMART says no problem has been found, > >neither does badblocks (run from a rescue CD bootup). SCSI cabling, > >terminator, etc has been checked. > > > > What should I investigate next? Is the disk condemned? > > SMART isnt fool-proof. I have had disks that go 'clunk/scraping > sounds/spin up' that have gotten SMART seal of approval. My normal > checklist is the above with replacing the items (in case that isnt > what you meant by check). > > Replace > terminator > scsi cable > controller > diskdrive > > though I usually do disk drive then controller. In my experience the drive is, by far, the most likely to have problems. Personally I never suspect anything else until I've fully tested the drive. -- Garrick Staples, GNU/Linux HPCC SysAdmin University of Southern California Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html
Attachment:
pgpkvSWQ1W54J.pgp
Description: PGP signature
_______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos