Hi
I have a Firewire connected Micronet 1.5TB RAID with a single large ext3 filesystem on one partition on a dual Xeon system.
I am checking out from an extremely large cvs repository (don't ask) to this drive over the course of many days, and intermittently I get bad blocks and the filesystem goes read-only. This is not related to any power failure or anything similar. The RAID is currently about 40% full; this started to happen around the 15% mark as I recall.
I checked the RAID firmware setup, found that caching was set to write-back, and changed it to write-through to see if that would help (since I gather the Linux kernel presumes write-through, though why it should make a difference in the absence of a reboot or power failure I don't understand).
This reduced the frequency of the error from once a night to once every couple of nights; interestingly mostly at about 04:03 AM or so. Looking at cron.daily, only mrtg and sa seem to be starting up at about that time.
I suspect the timing is related to a change in the pattern of disk activity rather than anything else.
I have no reason to suspect that there is anything actually wrong with the RAID itself, which just appears as a really big firewire external disk. It is new however, so this can't be ruled out.
My next step is to just turn off journaling and see if doing this with just ext2 works OK. Journaling doesn't seem to be doing much good as I am stuck regularly running ordinary fsck's with all these errors anyway !
I just thought I would ask if anyone else has had a similar experience, and whether such issues are known to be with ext3, or the firewire interface, or both together.
PS. I did actually create the partition and did the mkfs on an AMD64 FC3 system at a different site, though that is not the system to which the RAID is currently connected. Just mention that in case this makes a difference, but I presume an fsck would have noticed and fixed anything fundamentally wrong in this regard.
David
May 15 04:03:30 localhost kernel: Aborting journal on device sdd1. May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1): ext3_journal_start_sb: Detected aborted journal May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1): ext3_xattr_get: inode 63343526: bad block 165510584 May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1) in start_transaction: Journal has aborted May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1) in start_transaction: Journal has aborted May 15 04:03:30 localhost kernel: inode_doinit_with_dentry: getxattr returned 5 for dev=sdd1 ino=63343526 May 15 04:03:34 localhost kernel: EXT3-fs error (device sdd1): ext3_xattr_get: inode 63343381: bad block 141623810 May 15 04:03:34 localhost kernel: EXT3-fs error (device sdd1): ext3_xattr_get: inode 63947123: bad block 203323361
Linux localhost.localdomain 2.6.9-1.667smp #1 SMP Tue Nov 2 14:59:52 EST 2004 i686 i686 i386 GNU/Linux
_______________________________________________ Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users