This is a note to let you know that I've just added the patch titled ext4: Use our own write_cache_pages() to the 2.6.27-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: ext4-use-our-own-write_cache_pages.patch and it can be found in the queue-2.6.27 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxx> know about it. >From dev@xxxxxxxxxxxxxx Fri Jun 25 15:32:26 2010 From: Theodore Ts'o <tytso@xxxxxxx> Date: Fri, 28 May 2010 14:26:25 -0500 Subject: ext4: Use our own write_cache_pages() Cc: "Theodore Ts'o" <tytso@xxxxxxx>, Dave Chinner <david@xxxxxxxxxxxxx>, "Jayson R. King" <dev@xxxxxxxxxxxxxx>, Kay Diederichs <Kay.Diederichs@xxxxxxxxxxxxxxx>, Ext4 Developers List <linux-ext4@xxxxxxxxxxxxxxx>, "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> Message-ID: <4C0018E1.5060007@xxxxxxxxxxxxxx> From: Theodore Ts'o <tytso@xxxxxxx> commit 8e48dcfbd7c0892b4cfd064d682cc4c95a29df32 upstream. Make a copy of write_cache_pages() for the benefit of ext4_da_writepages(). This allows us to simplify the code some, and will allow us to further customize the code in future patches. There are some nasty hacks in write_cache_pages(), which Linus has (correctly) characterized as vile. I've just copied it into write_cache_pages_da(), without trying to clean those bits up lest I break something in the ext4's delalloc implementation, which is a bit fragile right now. This will allow Dave Chinner to clean up write_cache_pages() in mm/page-writeback.c, without worrying about breaking ext4. Eventually write_cache_pages_da() will go away when I rewrite ext4's delayed allocation and create a general ext4_writepages() which is used for all of ext4's writeback. Until now this is the lowest risk way to clean up the core write_cache_pages() function. Signed-off-by: "Theodore Ts'o" <tytso@xxxxxxx> Cc: Dave Chinner <david@xxxxxxxxxxxxx> [dev@xxxxxxxxxxxxxx: Dropped the hunks which reverted the use of no_nrwrite_index_update, since those lines weren't ever created on 2.6.27.y] [dev@xxxxxxxxxxxxxx: Copied from 2.6.27.y's version of write_cache_pages(), plus the changes to it from patch "vfs: Add no_nrwrite_index_update writeback control flag"] Signed-off-by: Jayson R. King <dev@xxxxxxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxx> --- fs/ext4/inode.c | 144 +++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 132 insertions(+), 12 deletions(-) --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2059,17 +2059,6 @@ static int __mpage_da_writepage(struct p struct buffer_head *bh, *head, fake; sector_t logical; - if (mpd->io_done) { - /* - * Rest of the page in the page_vec - * redirty then and skip then. We will - * try to to write them again after - * starting a new transaction - */ - redirty_page_for_writepage(wbc, page); - unlock_page(page); - return MPAGE_DA_EXTENT_TAIL; - } /* * Can we merge this page to current extent? */ @@ -2160,6 +2149,137 @@ static int __mpage_da_writepage(struct p } /* + * write_cache_pages_da - walk the list of dirty pages of the given + * address space and call the callback function (which usually writes + * the pages). + * + * This is a forked version of write_cache_pages(). Differences: + * Range cyclic is ignored. + * no_nrwrite_index_update is always presumed true + */ +static int write_cache_pages_da(struct address_space *mapping, + struct writeback_control *wbc, + struct mpage_da_data *mpd) +{ + struct backing_dev_info *bdi = mapping->backing_dev_info; + int ret = 0; + int done = 0; + struct pagevec pvec; + int nr_pages; + pgoff_t index; + pgoff_t end; /* Inclusive */ + long nr_to_write = wbc->nr_to_write; + + if (wbc->nonblocking && bdi_write_congested(bdi)) { + wbc->encountered_congestion = 1; + return 0; + } + + pagevec_init(&pvec, 0); + index = wbc->range_start >> PAGE_CACHE_SHIFT; + end = wbc->range_end >> PAGE_CACHE_SHIFT; + + while (!done && (index <= end)) { + int i; + + nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, + PAGECACHE_TAG_DIRTY, + min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); + if (nr_pages == 0) + break; + + for (i = 0; i < nr_pages; i++) { + struct page *page = pvec.pages[i]; + + /* + * At this point, the page may be truncated or + * invalidated (changing page->mapping to NULL), or + * even swizzled back from swapper_space to tmpfs file + * mapping. However, page->index will not change + * because we have a reference on the page. + */ + if (page->index > end) { + done = 1; + break; + } + + lock_page(page); + + /* + * Page truncated or invalidated. We can freely skip it + * then, even for data integrity operations: the page + * has disappeared concurrently, so there could be no + * real expectation of this data interity operation + * even if there is now a new, dirty page at the same + * pagecache address. + */ + if (unlikely(page->mapping != mapping)) { +continue_unlock: + unlock_page(page); + continue; + } + + if (!PageDirty(page)) { + /* someone wrote it for us */ + goto continue_unlock; + } + + if (PageWriteback(page)) { + if (wbc->sync_mode != WB_SYNC_NONE) + wait_on_page_writeback(page); + else + goto continue_unlock; + } + + BUG_ON(PageWriteback(page)); + if (!clear_page_dirty_for_io(page)) + goto continue_unlock; + + ret = __mpage_da_writepage(page, wbc, mpd); + + if (unlikely(ret)) { + if (ret == AOP_WRITEPAGE_ACTIVATE) { + unlock_page(page); + ret = 0; + } else { + done = 1; + break; + } + } + + if (nr_to_write > 0) { + nr_to_write--; + if (nr_to_write == 0 && + wbc->sync_mode == WB_SYNC_NONE) { + /* + * We stop writing back only if we are + * not doing integrity sync. In case of + * integrity sync we have to keep going + * because someone may be concurrently + * dirtying pages, and we might have + * synced a lot of newly appeared dirty + * pages, but have not synced all of the + * old dirty pages. + */ + done = 1; + break; + } + } + + if (wbc->nonblocking && bdi_write_congested(bdi)) { + wbc->encountered_congestion = 1; + done = 1; + break; + } + } + pagevec_release(&pvec); + cond_resched(); + } + return ret; +} + + +/* * mpage_da_writepages - walk the list of dirty pages of the given * address space, allocates non-allocated blocks, maps newly-allocated * blocks to existing bhs and issue IO them @@ -2192,7 +2312,7 @@ static int mpage_da_writepages(struct ad to_write = wbc->nr_to_write; - ret = write_cache_pages(mapping, wbc, __mpage_da_writepage, mpd); + ret = write_cache_pages_da(mapping, wbc, mpd); /* * Handle last extent of pages Patches currently in stable-queue which might be from tytso@xxxxxxx are queue-2.6.27/ext4-fix-file-fragmentation-during-large-file-write.patch queue-2.6.27/ext4-use-our-own-write_cache_pages.patch queue-2.6.27/ext4-implement-range_cyclic-in-ext4_da_writepages-instead-of-write_cache_pages.patch queue-2.6.27/ext4-check-s_log_groups_per_flex-in-online-resize-code.patch -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html