Hi, On Tue, 2003-02-04 at 01:48, Andrew Morton wrote: > Now, generally the kernel will attempt to prevent serialising userspace > behind background writeout. But there's one spot in do_get_write_access(): > > if (jh->b_jlist == BJ_Shadow) { > > where a random mark_inode_dirty() call will serialise behind the ongoing > transaction commit. That is a deliberate choice, but it's something I've wondered about. Basically, the problem is this --- the journal *must* be a consistent snapshot of the filesystem, but at the same time, we want to avoid having to do an actual copy of all dirty data for the journal. So, we don't do the copy if we can avoid it. If, during the commit, another transaction tries to modify the data, we just let it do so, and we make a copy on the spot. *But*, if we have already scheduled the old data for IO at that point, then we can't do this, and we block. The only way to avoid this is to do the copy in the first place, during commit, before we know whether or not anybody will need the copy; and that will be expensive on CPU time if it turns out that nobody needs the copy. mark_inode_dirty() is a special case, though, and the current ext3 dev snapshots avoid that blocking on buffer-cache operations for the most part; but we still need to reserve the journal space for the operation, and that still blocks in the case above. I wonder if it might be worth special-casing inodes and superblocks, and always doing the commit copy for those. Cheers, Stephen _______________________________________________ Ext3-users@redhat.com https://listman.redhat.com/mailman/listinfo/ext3-users