From: Chris Frost <frost@xxxxxxxxxxx> Ensure that cached pages in the inactive list are not prematurely evicted; move such pages to lru head when they are covered by - in-kernel heuristic readahead - an posix_fadvise(POSIX_FADV_WILLNEED) hint from an application Before this patch, pages already in core may be evicted before the pages covered by the same prefetch scan but that were not yet in core. Many small read requests may be forced on the disk because of this behavior. In particular, posix_fadvise(... POSIX_FADV_WILLNEED) on an in-core page has no effect on the page's location in the LRU list, even if it is the next victim on the inactive list. This change helps address the performance problems we encountered while modifying SQLite and the GIMP to use large file prefetching. Overall these prefetching techniques improved the runtime of large benchmarks by 10-17x for these applications. More in the publication _Reducing Seek Overhead with Application-Directed Prefetching_ in USENIX ATC 2009 and at http://libprefetch.cs.ucla.edu/. Notes from Fengguang: I'm actually not afraid of it adding memory pressure to the readahead thrashing case. The context readahead can adaptively control the memory pressure with or without this patch. It does add memory pressure to mmap read-around. A typical read-around request would cover some cached pages (whether or not they are memory-mapped), and all those pages would be moved to LRU head by this patch. This somehow implicitly adds LRU lifetime to executable/lib pages. Hopefully this won't behave too bad. Note that the read-around size will be limited in small memory systems, which in turn reduces the risk of this patch. Acked-by: Rik van Riel <riel@xxxxxxxxxx> Signed-off-by: Chris Frost <frost@xxxxxxxxxxx> Signed-off-by: Steve VanDeBogart <vandebo@xxxxxxxxxxx> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> --- mm/readahead.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) --- linux.orig/mm/readahead.c 2010-02-24 10:44:26.000000000 +0800 +++ linux/mm/readahead.c 2010-02-24 10:44:40.000000000 +0800 @@ -9,7 +9,9 @@ #include <linux/kernel.h> #include <linux/fs.h> +#include <linux/memcontrol.h> #include <linux/mm.h> +#include <linux/mm_inline.h> #include <linux/module.h> #include <linux/blkdev.h> #include <linux/backing-dev.h> @@ -133,6 +135,40 @@ out: } /* + * The file range is expected to be accessed in near future. Move pages + * (possibly in inactive lru tail) to lru head, so that they are retained + * in memory for some reasonable time. + */ +static void retain_inactive_pages(struct address_space *mapping, + pgoff_t index, int len) +{ + int i; + struct page *page; + struct zone *zone; + + for (i = 0; i < len; i++) { + page = find_get_page(mapping, index + i); + if (!page) + continue; + + zone = page_zone(page); + spin_lock_irq(&zone->lru_lock); + + if (PageLRU(page) && + !PageActive(page) && + !PageUnevictable(page)) { + int lru = page_lru_base_type(page); + + del_page_from_lru_list(zone, page, lru); + add_page_to_lru_list(zone, page, lru); + } + + spin_unlock_irq(&zone->lru_lock); + put_page(page); + } +} + +/* * __do_page_cache_readahead() actually reads a chunk of disk. It allocates all * the pages first, then submits them all for I/O. This avoids the very bad * behaviour which would occur if page allocations are causing VM writeback. @@ -184,6 +220,14 @@ __do_page_cache_readahead(struct address } /* + * Normally readahead will auto stop on cached segments, so we won't + * hit many cached pages. If it does happen, bring the inactive pages + * adjecent to the newly prefetched ones(if any). + */ + if (ret < nr_to_read) + retain_inactive_pages(mapping, offset, page_idx); + + /* * Now start the IO. We ignore I/O errors - if the page is not * uptodate then the caller will launch readpage again, and * will then handle the error. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html