On Mon, Sep 21, 2009 at 06:02:42PM +0800, Jan Kara wrote: > On Mon 21-09-09 17:53:26, Wu Fengguang wrote: > > On Mon, Sep 21, 2009 at 01:35:46PM +0800, Wu Fengguang wrote: > > > > > Here is how I'd imaging the writeout logic should work: > > > > > We would have just two lists - b_dirty and b_more_io. Both would be > > > > > ordered by dirtied_when. > > > > > > > > Andrew has a very good description for the dirty/io/more_io queues: > > > > > > > > http://lkml.org/lkml/2006/2/7/5 > > > > > > > > | So the protocol would be: > > > > | > > > > | s_io: contains expired and non-expired dirty inodes, with expired ones at > > > > | the head. Unexpired ones (at least) are in time order. > > > > | > > > > | s_more_io: contains dirty expired inodes which haven't been fully written. > > > > | Ordering doesn't matter (unless someone goes and changes > > > > | dirty_expire_centisecs - but as long as we don't do anything really bad in > > > > | response to this we'll be OK). > > > > | > > > > | s_dirty: contains expired and non-expired dirty inodes. The non-expired > > > > | ones are in time-of-dirtying order. > > > > > > > > Since then s_io was changed to hold only _expired_ dirty inodes at the > > > > beginning of a full scan. It serves as a bounded set of dirty inodes. > > > > So that when finished a full scan of it, the writeback can go on to > > > > the next superblock, and old dirty files' writeback won't be delayed > > > > infinitely by poring in newly dirty files. > > > > > > > > It seems that the boundary could also be provided by some > > > > older_than_this timestamp. So removal of b_io is possible > > > > at least on this purpose. > > > > > > Yeah, this is a scratch patch to remove b_io, I see no obvious > > > difficulties in doing so. > > > > However the removal of b_io is not that good for possible b_dirty > > optimizations. For example, we could use a tree for b_dirty for more > > flexible ordering. Or can introduce a b_dirty_atime to hold the inodes > > dirtied by atime and expire them much lazily: > > > > expire > 30m > > b_dirty_atime --------------+ > > | > > +--- b_io ---> writeback > > | > > b_dirty --------------------+ > > expire > 30s > Well, you can still implement the above without a need for b_io list. The > kupdate-style writeback can for example check the first inode in both lists > and process the inode which is expired for a longer time. OK. Given that rel_atime is default now, such optimization seems less useful anyway. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html