On Tue, Jan 07, 2003 at 08:50:28AM -0500, Maurice Volaski wrote: > Box with Redhat 7.1 and kernel 2.4.20 has a hardware RAID (about 750 > gigs data) attached via an Adaptec 29160 LP card... > > I began seeing numerous SCSI errors in my logs for our external > hardware RAID and input/output errors on test attempts to copy files > via cp. Rebooted and immediately saw the errors as the disk was > initially accessed by the Adaptec driver. The RAID controller did not > report any problems and when I swapped cards, those errors stopped. > > However, fsck.ext3 (1.27) immediately detected errors. > > The question is how do I know to trust fsck? What exactly does it > mean that a file shares blocks with other files? How does fsck know > file the block really belongs to or could it actually mean files are > corrupted and fsck is letting them get by? What probably happened here is the RAID controller got confused, and write parts of the inode table to the wrong location on disk. So for example, suppose the block from the inode table describing inodes 8-15 got written on top of the block in the inode table which is supposed to describe inodes 32-39. This will result in inodes 8 and 32 claiming the same blocks, and thus fsck will complain. Does fsck know which file a block "really" belongs to? Nope; it doesn't have psychic abilities. In this scenario, the information of which blocks are associated with inodes 32-39 is gone, and was replaced with the blocks associated with inodes 8-15. What e2fsck will do in this case is to allocate new blocks and fill them with a copy of the data, so that inodes 8-15 and 32-39 have their own unique set of data blocks. However, it doesn't restore the missing data -- it can't. What this does do is make the filesystem consistent so that it's safe to mount the filesystem, and then the system administrator must sort through the files to determine which files have valid data, and which ones do not. This is one reason why e2fsck is so meticulous about printing full pathnames during pass 1B/1C/1D processing. (For this reason, if you know that there is a lot of filesystem damage, it can be very useful to run e2fsck under script, so you have a full transcript of e2fsck's output.) > What I am asking is should I trust fsck's apparent success or should > I choose to reformat and restore? If you have reliable backups, by all means use them. On the other hand, if there is some precious data that was not backed up, it might be worth going through the filesystem to see what you can save before you give up on it. Good luck! - Ted P.S. Consider yourself fortunate that you have backups! This is not the first time that I've seen the case where a RAID controller goes insane, and wipes out huge amounts of data. And it's fairly common that sysadmins assume that RAID means that they don't need to do backups since they're protected against disk failures, and then get totally screwed when the RAID controller goes insane. _______________________________________________ Ext3-users@redhat.com https://listman.redhat.com/mailman/listinfo/ext3-users