Re: [PATCH 11/12] vmscan: Write out dirty pages in batch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 15, 2010 at 12:43:42PM +0100, Mel Gorman wrote:
> <SNIP on whether sorting should be in page cache or block layer>
> 
> > It would be interesting to code up a little test patch though, see if
> > there's benefit to be had going down this path.
> > 
> 
> I'll do this just to see what it looks like. To be frank, I lack taste when
> it comes to how the block layer and filesystem should behave so am having
> troube deciding if sorting the pages prior to submission is a good thing or
> if it would just encourage bad or lax behaviour in the IO submission queueing.
> 

The patch to sort the list being cleaned by reclaim looks like this.
It's not actually tested

vmscan: Sort pages being queued for IO before submitting to the filesystem

While page reclaim submits dirty pages in batch, it doesn't change the
order in which the IO is issued - it is still issued in LRU order. Given
that they are issued in a short period of time now, rather than across a
longer scan period, it is likely that it will not be any faster as:

        a) IO will not be started as soon, and
        b) the IO scheduler still only has a small re-ordering
           window and will choke just as much on random IO patterns.

This patch uses list_sort() function to sort
the list; sorting the list of pages by mapping and page->index
within the mapping would result in all the pages on each mapping
being sent down in ascending offset order at once - exactly how the
filesystems want IO to be sent to it.

Credit mostly goes to Dave Chinner for this idea and the changelog text.

----

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 68b3d22..02ab246 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -32,6 +32,7 @@
 #include <linux/topology.h>
 #include <linux/cpu.h>
 #include <linux/cpuset.h>
+#include <linux/list_sort.h>
 #include <linux/notifier.h>
 #include <linux/rwsem.h>
 #include <linux/delay.h>
@@ -651,6 +652,34 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
 		__pagevec_free(&freed_pvec);
 }
 
+/* Sort based on mapping then index */
+static int page_writeback_cmp(void *data, struct list_head *a, struct list_head *b)
+{
+	struct page *ap = list_entry(a, struct page, lru);
+	struct page *bp = list_entry(b, struct page, lru);
+	pgoff_t diff;
+
+	/*
+	 * Page not locked but it's not critical, the mapping is just for sorting
+	 * If the mapping is no longer valid, it's of little consequence
+	 */
+	if (ap->mapping != bp->mapping) {
+		if (ap->mapping < bp->mapping)
+			return -1;
+		if (ap->mapping > bp->mapping)
+			return 1;
+		return 0;
+	}
+	
+	/* Then index */
+	diff = ap->index - bp->index;
+	if (diff < 0)
+		return -1;
+	if (diff > 0)
+		return 1;
+	return 0;
+}
+
 static noinline_for_stack void clean_page_list(struct list_head *page_list,
 				struct scan_control *sc)
 {
@@ -660,6 +689,8 @@ static noinline_for_stack void clean_page_list(struct list_head *page_list,
 	if (!sc->may_writepage)
 		return;
 
+	list_sort(NULL, page_list, page_writeback_cmp);
+
 	/* Write the pages out to disk in ranges where possible */
 	while (!list_empty(page_list)) {
 		struct address_space *mapping;
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux