On Wed 07-09-11 10:19:47, Peter Zijlstra wrote: > On Wed, 2011-09-07 at 08:56 +0200, Christoph Hellwig wrote: > > On Wed, Sep 07, 2011 at 02:22:22AM +0200, Jan Kara wrote: > > > > So wtf is ext4 doing? Shouldn't a page stay dirty until its written out? > > > > > > > > That is, should we really frob around this behaviour or fix ext4 because > > > > its on crack? > > > Fengguang, could you please verify your findings with recent kernel? I > > > believe ext4 got fixed in this regard some time ago already (and yes, old > > > delalloc writeback code in ext4 was terrible). > > > > The pattern we do in writeback is: > > > > in pageout / write_cache_pages: > > lock_page(); > > clear_page_dirty_for_io(); > > > > in ->writepage: > > set_page_writeback(); > > unlock_page(); > > end_page_writeback(); > > > > So whenever ->writepage decides it doesn't want to write things back > > we have to redirty pages. We have this happen quite a bit in every > > filesystem, but ext4 hits it a lot more than usual because it refuses > > to write out delalloc pages from plain ->writepage and only allows > > ->writepages to do it. > > Ah, right, so it is a fairly common thing and not something easily fixed > in filesystems. Well, it depends on what you call common - usually, ->writepage is called from kswapd which shouldn't be common compared to writeback from a flusher thread. But now I've realized that JBD2 also calls ->writepage to fulfill data=ordered mode guarantees and that's what causes most of redirtying of pages on ext4. That's going away eventually but it will take some time. So for now writeback has to handle redirtying... Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html