Box with Redhat 7.1 and kernel 2.4.20 has a hardware RAID (about 750
gigs data) attached via an Adaptec 29160 LP card...
I began seeing numerous SCSI errors in my logs for our external
hardware RAID and input/output errors on test attempts to copy files
via cp. Rebooted and immediately saw the errors as the disk was
initially accessed by the Adaptec driver. The RAID controller did not
report any problems and when I swapped cards, those errors stopped.
However, fsck.ext3 (1.27) immediately detected errors.
At first, it didn't seem too bad, about a few dozen inodes with
illegal blocks, but then it got to reporting duplicate/bad blocks,
probably a few *thousand* of them. It pointed out hundreds of files
that shared blocks, which apparently it cloned.
At the end, it claimed despite all those errors, all was well and
considers the system clean.
The initial SCSI errors occurred during a backup and the backup
program had a number of files to go to finish. After the fsck, the
backup program claimed to have about 800 fewer files to finish than
before, implying that these files disappeared during the fsck (and
that was just a portion of the disk).
The question is how do I know to trust fsck? What exactly does it
mean that a file shares blocks with other files? How does fsck know
file the block really belongs to or could it actually mean files are
corrupted and fsck is letting them get by?
What I am asking is should I trust fsck's apparent success or should
I choose to reformat and restore?
--
Maurice Volaski, mvolaski@aecom.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
_______________________________________________
Ext3-users@redhat.com
https://listman.redhat.com/mailman/listinfo/ext3-users