On Mon, 14 Oct 2013 19:34:59 +0200 Jan Kara <jack@xxxxxxx> wrote: > On Mon 14-10-13 21:50:54, Ming Lei wrote: > > On Mon, Oct 14, 2013 at 8:58 PM, Jan Kara <jack@xxxxxxx> wrote: > > > Umm, I guess I see what are you pointing at. Thanks for catching that. > > > mpage_process_page_bufs() always adds a buffer to mpd even if nr_to_write > > > is already <= 0. But I would somewhat prefer not to call > > > mpage_prepare_extent_to_map() at all when nr_to_write <= 0. So a patch > > > like: > > > ret = mpage_prepare_extent_to_map(&mpd); > > > if (!ret) { > > > - if (mpd.map.m_len) > > > + if (mpd.map.m_len) { > > > ret = mpage_map_and_submit_extent(handle, &mpd, > > > &give_up_on_write); > > > - else { > > > + done = (wbc->nr_to_write <= 0); > > > + } else { > > > > > > Should also fix your problem, am I right? > > > > I am afraid it can't, because we need to stop scanning page cache > > if end of file is reached. > That should be OK. mpage_process_page_bufs() won't add a buffer beyond > EOF so we end the extent at EOF and next time we don't add anything to the > extent. My change wouldn't change anything in this. I mean wbc->nr_to_write may still be positive when mpage_prepare_extent_to_map() returns zero, so your change will cause ext4_writepages not to break from the loop, then write hangs, too crazy? > > > nr_to_write will become negative inside mpage_map_and_submit_extent(), > > that is why I fix it inside mpage_prepare_extent_to_map(). > Yes, mpage_map_and_submit_extent() creates negative nr_to_write but only > because mpage_prepare_extent_to_map() asked for mapping too big extent. But > if I'm reading the code correctly we first ask for writing the extent of > just the right size (nr_to_write becomes 0) but then ext4_writepages() asks > mpage_prepare_extent_to_map() for another extent and it will create a single > page extent for mapping before it finds out nr_to_write <= 0 and > terminates. Am I understanding the problem correctly? Your understanding is right, and I think it is a correct fix, because we shouldn't add more pages to extent than nr_to_write. > > After thinking about it again, moving the condition in > mpage_prepare_extent_to_map() as you did is probably better that what I > suggested. Just move it more up in the loop - like after page->index > end > condition. So that we don't unnecessarily lock the page etc. Looks it makes sense, so how about below change? -- diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 32c04ab..c32b599 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2294,7 +2294,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) { struct address_space *mapping = mpd->inode->i_mapping; struct pagevec pvec; - unsigned int nr_pages; + unsigned int nr_pages, nr_added = 0; pgoff_t index = mpd->first_page; pgoff_t end = mpd->last_page; int tag; @@ -2330,6 +2330,18 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) if (page->index > end) goto out; + /* + * Accumulated enough dirty pages? This doesn't apply + * to WB_SYNC_ALL mode. For integrity sync we have to + * keep going because someone may be concurrently + * dirtying pages, and we might have synced a lot of + * newly appeared dirty pages, but have not synced all + * of the old dirty pages. + */ + if (mpd->wbc->sync_mode == WB_SYNC_NONE && + nr_added >= mpd->wbc->nr_to_write) + goto out; + /* If we can't merge this page, we are done. */ if (mpd->map.m_len > 0 && mpd->next_page != page->index) goto out; @@ -2364,19 +2376,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) if (err <= 0) goto out; err = 0; - - /* - * Accumulated enough dirty pages? This doesn't apply - * to WB_SYNC_ALL mode. For integrity sync we have to - * keep going because someone may be concurrently - * dirtying pages, and we might have synced a lot of - * newly appeared dirty pages, but have not synced all - * of the old dirty pages. - */ - if (mpd->wbc->sync_mode == WB_SYNC_NONE && - mpd->next_page - mpd->first_page >= - mpd->wbc->nr_to_write) - goto out; + nr_added++; } pagevec_release(&pvec); cond_resched(); Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html