On Tue, Jun 15, 2010 at 12:43:42PM +0100, Mel Gorman wrote: > <SNIP on whether sorting should be in page cache or block layer> > > > It would be interesting to code up a little test patch though, see if > > there's benefit to be had going down this path. > > > > I'll do this just to see what it looks like. To be frank, I lack taste when > it comes to how the block layer and filesystem should behave so am having > troube deciding if sorting the pages prior to submission is a good thing or > if it would just encourage bad or lax behaviour in the IO submission queueing. > The patch to sort the list being cleaned by reclaim looks like this. It's not actually tested vmscan: Sort pages being queued for IO before submitting to the filesystem While page reclaim submits dirty pages in batch, it doesn't change the order in which the IO is issued - it is still issued in LRU order. Given that they are issued in a short period of time now, rather than across a longer scan period, it is likely that it will not be any faster as: a) IO will not be started as soon, and b) the IO scheduler still only has a small re-ordering window and will choke just as much on random IO patterns. This patch uses list_sort() function to sort the list; sorting the list of pages by mapping and page->index within the mapping would result in all the pages on each mapping being sent down in ascending offset order at once - exactly how the filesystems want IO to be sent to it. Credit mostly goes to Dave Chinner for this idea and the changelog text. ---- diff --git a/mm/vmscan.c b/mm/vmscan.c index 68b3d22..02ab246 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -32,6 +32,7 @@ #include <linux/topology.h> #include <linux/cpu.h> #include <linux/cpuset.h> +#include <linux/list_sort.h> #include <linux/notifier.h> #include <linux/rwsem.h> #include <linux/delay.h> @@ -651,6 +652,34 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages) __pagevec_free(&freed_pvec); } +/* Sort based on mapping then index */ +static int page_writeback_cmp(void *data, struct list_head *a, struct list_head *b) +{ + struct page *ap = list_entry(a, struct page, lru); + struct page *bp = list_entry(b, struct page, lru); + pgoff_t diff; + + /* + * Page not locked but it's not critical, the mapping is just for sorting + * If the mapping is no longer valid, it's of little consequence + */ + if (ap->mapping != bp->mapping) { + if (ap->mapping < bp->mapping) + return -1; + if (ap->mapping > bp->mapping) + return 1; + return 0; + } + + /* Then index */ + diff = ap->index - bp->index; + if (diff < 0) + return -1; + if (diff > 0) + return 1; + return 0; +} + static noinline_for_stack void clean_page_list(struct list_head *page_list, struct scan_control *sc) { @@ -660,6 +689,8 @@ static noinline_for_stack void clean_page_list(struct list_head *page_list, if (!sc->may_writepage) return; + list_sort(NULL, page_list, page_writeback_cmp); + /* Write the pages out to disk in ranges where possible */ while (!list_empty(page_list)) { struct address_space *mapping; -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html