On Wed, Oct 07, 2009 at 09:29:24PM +0800, Wu Fengguang wrote: > On Wed, Oct 07, 2009 at 07:37:35PM +0800, Nikita Danilov wrote: > > 2009/10/7 Wu Fengguang <fengguang.wu@xxxxxxxxx>: > > > On Wed, Oct 07, 2009 at 06:38:37PM +0800, Nikita Danilov wrote: > > >> Hello, > > >> > > > > [...] > > > > > > > > Glad to know about your experiences :) Interestingly I started with > > > ->writepage() and then switch to ->writepages() because filesystems > > > behave better with the latter (i.e. less file fragmentation). > > > > By the way, why is your patch doing > > > > ->writepage(page->index); > > generic_writepages(page->index + 1, LUMPY_PAGEOUT_PAGES - 1); > > > > instead of > > > > generic_writepages(page->index, LUMPY_PAGEOUT_PAGES); > > > > ? Is this because of the difficulties with passing back page specific > > errors from generic_writepages()? > > Yes. It's possible to tell write_cache_pages() to return > AOP_WRITEPAGE_ACTIVATE. Other ->writepages() don't have to deal with > this because their ->writepage() won't return AOP_WRITEPAGE_ACTIVATE > at all. > > But it is going to be ugly to specialize the first locked page in > every ->writepages() functions.. > > > > > > > I'd like to just ignore the shmem case, by adding a > > > bdi_cap_writeback_dirty() check. Because clustered writing to swap > > > device may be a less gain. > > > > Or you can just call try_to_unmap() from shmem_writepage() when > > wbc->for_reclaim is true. > > Hmm, it's more comfortable to stay away from shmem for the initial version. > But feel free to submit a patch for it in future :) > > > > Page filtering should also be possible in write_cache_pages(). But > > > what do you mean by "hard-to-fix races against inode reclamation"? > > > > vmscan.c pageout path doesn't take a reference on inode, so the > > instant ->writepage() releases lock on the page, the inode can be > > freed. > > Ah, then we could just do igrab() if inode_lock is not locked? > A bit ugly though. Since writepage() can sleep, we don't need to worry about inode_lock. Here is the updated patch. Thanks, Fengguang --- vmscan: lumpy pageout When pageout a dirty page, try to piggy back more consecutive dirty pages (up to 512KB) to improve IO efficiency. Only ext3/reiserfs which don't have its own aops->writepages are supported in this initial version. CC: Dave Chinner <david@xxxxxxxxxxxxx> CC: Nikita Danilov <danilov@xxxxxxxxx> Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> --- mm/page-writeback.c | 12 ++++++++++++ mm/vmscan.c | 16 ++++++++++++++++ 2 files changed, 28 insertions(+) --- linux.orig/mm/vmscan.c 2009-10-07 21:39:13.000000000 +0800 +++ linux/mm/vmscan.c 2009-10-07 21:39:57.000000000 +0800 @@ -344,6 +344,8 @@ typedef enum { PAGE_CLEAN, } pageout_t; +#define LUMPY_PAGEOUT_PAGES (512 * 1024 / PAGE_CACHE_SIZE) + /* * pageout is called by shrink_page_list() for each dirty page. * Calls ->writepage(). @@ -399,16 +401,30 @@ static pageout_t pageout(struct page *pa .for_reclaim = 1, }; + igrab(mapping->host); SetPageReclaim(page); res = mapping->a_ops->writepage(page, &wbc); if (res < 0) handle_write_error(mapping, page, res); if (res == AOP_WRITEPAGE_ACTIVATE) { ClearPageReclaim(page); + iput(mapping->host); return PAGE_ACTIVATE; } /* + * only write_cache_pages() supports for_reclaim for now + * ignore shmem for now, thanks to Nikita. + */ + if (bdi_cap_writeback_dirty(mapping->backing_dev_info) && + !mapping->a_ops->writepages) { + wbc.range_start = (page->index + 1) << PAGE_CACHE_SHIFT; + wbc.nr_to_write = LUMPY_PAGEOUT_PAGES - 1; + generic_writepages(mapping, &wbc); + iput(mapping->host); + } + + /* * Wait on writeback if requested to. This happens when * direct reclaiming a large contiguous area and the * first attempt to free a range of pages fails. --- linux.orig/mm/page-writeback.c 2009-10-07 21:39:13.000000000 +0800 +++ linux/mm/page-writeback.c 2009-10-07 21:39:14.000000000 +0800 @@ -805,6 +805,11 @@ int write_cache_pages(struct address_spa break; } + if (wbc->for_reclaim && done_index != page->index) { + done = 1; + break; + } + if (nr_to_write != wbc->nr_to_write && done_index + WB_SEGMENT_DIST < page->index && --wbc->nr_segments <= 0) { @@ -846,6 +851,13 @@ continue_unlock: if (!clear_page_dirty_for_io(page)) goto continue_unlock; + /* + * active and unevictable pages will be checked at + * rotate time + */ + if (wbc->for_reclaim) + SetPageReclaim(page); + ret = (*writepage)(page, wbc, data); if (unlikely(ret)) { if (ret == AOP_WRITEPAGE_ACTIVATE) { -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html