Hello, On Sat, Feb 23, 2013 at 05:58:00PM -0500, Phillip Susi wrote: > The previous implementation initiated writeout for a non congested bdi, and > then discarded any clean pages. This had 3 problems: > > 1) The writeout would spin up the disk unnecessarily > 2) Discarding pages under low cache pressure is a waste > 3) It was useless on files being written, and thus full of dirty pages > > Now we just move the pages to the inactive list so they will be reclaimed > sooner. > --- > include/linux/fs.h | 2 ++ > mm/fadvise.c | 8 ++------ > mm/filemap.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 47 insertions(+), 6 deletions(-) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 7d2e893..2abd193 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2198,6 +2198,8 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping, > loff_t start, loff_t end, int sync_mode); > extern int filemap_fdatawrite_range(struct address_space *mapping, > loff_t start, loff_t end); > +extern void filemap_deactivate_range(struct address_space *mapping, pgoff_t start, > + pgoff_t end); > > extern int vfs_fsync_range(struct file *file, loff_t start, loff_t end, > int datasync); > diff --git a/mm/fadvise.c b/mm/fadvise.c > index a47f0f5..fbd58b0 100644 > --- a/mm/fadvise.c > +++ b/mm/fadvise.c > @@ -112,17 +112,13 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice) > case POSIX_FADV_NOREUSE: > break; > case POSIX_FADV_DONTNEED: > - if (!bdi_write_congested(mapping->backing_dev_info)) > - __filemap_fdatawrite_range(mapping, offset, endbyte, > - WB_SYNC_NONE); > - > /* First and last FULL page! */ > start_index = (offset+(PAGE_CACHE_SIZE-1)) >> PAGE_CACHE_SHIFT; > end_index = (endbyte >> PAGE_CACHE_SHIFT); > > if (end_index >= start_index) > - invalidate_mapping_pages(mapping, start_index, > - end_index); > + filemap_deactivate_range(mapping, start_index, > + end_index); > break; > default: > ret = -EINVAL; > diff --git a/mm/filemap.c b/mm/filemap.c > index c610076..bcdcdbf 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -217,7 +217,49 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, > return ret; > } > > +/** > + * filemap_deactivate_range - moves pages in range to the inactive list > + * @mapping: the address_space which holds the pages to deactivate > + * @start: offset where the range starts > + * @end: offset where the range ends (inclusive) > + */ > +void filemap_deactivate_range(struct address_space *mapping, pgoff_t start, > + pgoff_t end) > +{ > + struct pagevec pvec; > + pgoff_t index = start; > + int i; > + > + /* > + * Note: this function may get called on a shmem/tmpfs mapping: > + * pagevec_lookup() might then return 0 prematurely (because it > + * got a gangful of swap entries); but it's hardly worth worrying > + * about - it can rarely have anything to free from such a mapping > + * (most pages are dirty), and already skips over any difficulties. > + */ > + > + pagevec_init(&pvec, 0); > + while (index <= end && pagevec_lookup(&pvec, mapping, index, > + min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) { > + mem_cgroup_uncharge_start(); > + for (i = 0; i < pagevec_count(&pvec); i++) { > + struct page *page = pvec.pages[i]; > + > + /* We rely upon deletion not changing page->index */ > + index = page->index; > + if (index > end) > + break; > + > + WARN_ON(page->index != index); > + deactivate_page(page); > + } > + pagevec_release(&pvec); > + mem_cgroup_uncharge_end(); > + cond_resched(); > + index++; > + } > +} > + > static inline int __filemap_fdatawrite(struct address_space *mapping, > int sync_mode) > { > -- > 1.8.1.2 > Just FYI, there was a person tried to solve similar problem, Andrea Righi. Ccing him, Personally, I like this but unfortunately maintainer didn't like it due to breaking compatibility and I understand. https://lkml.org/lkml/2011/6/28/468 You might need another round with akpm. :( -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>