On May 18, 2007 09:48 -0400, Mats Ahlgren wrote: > Namely, I'm confused: I would guess caching simply delays the time data gets > to disk, and perhaps exacerbates data being written in not-the-order it was > given? But, how could this cause a problem on a journaled filesystem? if one > is (theoretically) only appending to the journal, checksumming/hashing to > detect consistent journal entries on failure (since the last checkpoint), and > only replaying consistent journal entries (which are idempotent)... then, > assuming all those things above work, how could caching cause massive > corruption of the directory tree? (Is the above an accurate model for ext3?) One issue is that we do not YET have journal checksumming in order to detect the case where the commit block is written to the disk but not all of the disk-cached blocks in the rest of that transaction are not yet committed. That is where the big risk comes in for writeback cache in the device. Ideally, the jbd layer could be notified when the transaction blocks are flushed from device cache before writing the commit block, but the current linux mechanism to do this (write barriers) sucks perforance-wise (it sent throughput from 180MB/s to 7MB/s when enabled in our test systems). It was better to just turn off write cache entirely than to use barriers. We have a patch for journal checksumming that is _right_ at the verge of being ready for fixing the "commit-block before transaction blocks" problem. In fact, in earlier testing it improved performance in some cases because it allows the commit block to always be sent to disk at the same time as the transaction blocks because we know the checksum will tell us if there were any blocks not written to disk. Girish, could you post your latest tested patch here for review? > Also, does anyone think data-journaling mode being 'ordered' instead > of 'journaled' had anything to do with it? Seems unlikely. > On Sunday 18 March 2007 09:33:59 Theodore Tso wrote: > > It sounds like you have a disk which is doing very aggressive write > > caching. If you are using a new enough kernel (2.6.9 or greater > > should have this), adding "barrier=1" to your mount options should > > help. We should probably make this the default at this point... > > > > - Ted Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users