+ add-truncate_unmap_inode_pages_range.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     add truncate_unmap_inode_pages_range()
has been added to the -mm tree.  Its filename is
     add-truncate_unmap_inode_pages_range.patch

*** Remember to use Documentation/SubmitChecklist when testing your code ***

See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
out what to do about this

------------------------------------------------------
Subject: add truncate_unmap_inode_pages_range()
From: David Chinner <dgc@xxxxxxx>

With the recent changes to cancel_dirty_pages(), XFS will dump warnings in
the syslog because it can truncate_inode_pages() on dirty mapped pages.

I've determined that this is indeed correct behaviour for XFS as this can
happen in the case of races on mmap()d files with direct I/O.  In this case
when we do a direct I/O read, we flush the dirty pages to disk, then
truncate them out of the page cache.  Unfortunately, between the flush and
the truncate the mmap could dirty the page again.  At this point we toss a
dirty page that is mapped.

None of the existing functions for truncating pages or invalidating pages
work in this situation.  Invalidating a page only works for non-dirty pages
with non-dirty buffers, and they only work for whole pages and XFS requires
partial page truncation.

On top of that the page invalidation functions don't actually call into the
filesystem to invalidate the page and so the filesystem can't actually
invalidate the page properly (e.g.  do stuff based on private buffer head
flags).

So that leaves us needing to use truncate semantics an the problem is that
none of them unmap pages in a non-racy manner - if they unmap pages they do
it separately tothe truncate of the page, leading to races with mmap
redirtying the page between the unmap and the truncate ofthe page.

Hence we need a truncate function that unmaps the pages while they are
locked for truncate in a similar fashion to
invalidate_inode_pages2_range().  The following patch (unchanged from the
last time it was sent) does this.  The XFS changes are in a second patch.

The patch has been test on ia64 and x86-64 via XFSQA and a lot of fsx
mixing mmap and direct I/O operations.

Signed-off-by: Dave Chinner <dgc@xxxxxxx>
Cc: Nick Piggin <nickpiggin@xxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
---

 include/linux/mm.h |    2 +
 mm/truncate.c      |   60 ++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 56 insertions(+), 6 deletions(-)

diff -puN include/linux/mm.h~add-truncate_unmap_inode_pages_range include/linux/mm.h
--- a/include/linux/mm.h~add-truncate_unmap_inode_pages_range
+++ a/include/linux/mm.h
@@ -1059,6 +1059,8 @@ extern unsigned long page_unuse(struct p
 extern void truncate_inode_pages(struct address_space *, loff_t);
 extern void truncate_inode_pages_range(struct address_space *,
 				       loff_t lstart, loff_t lend);
+extern void truncate_unmap_inode_pages_range(struct address_space *,
+				       loff_t lstart, loff_t lend, int unmap);
 
 /* generic vm_area_ops exported for stackable file systems */
 extern struct page *filemap_nopage(struct vm_area_struct *, unsigned long, int *);
diff -puN mm/truncate.c~add-truncate_unmap_inode_pages_range mm/truncate.c
--- a/mm/truncate.c~add-truncate_unmap_inode_pages_range
+++ a/mm/truncate.c
@@ -59,7 +59,7 @@ void cancel_dirty_page(struct page *page
 
 		WARN_ON(++warncount < 5);
 	}
-		
+
 	if (TestClearPageDirty(page)) {
 		struct address_space *mapping = page->mapping;
 		if (mapping && mapping_cap_account_dirty(mapping)) {
@@ -122,16 +122,34 @@ invalidate_complete_page(struct address_
 	return ret;
 }
 
+/*
+ * This is a helper for truncate_unmap_inode_page. Unmap the page we
+ * are passed. Page must be locked by the caller.
+ */
+static void
+unmap_single_page(struct address_space *mapping, struct page *page)
+{
+	BUG_ON(!PageLocked(page));
+	while (page_mapped(page)) {
+		unmap_mapping_range(mapping,
+			(loff_t)page->index << PAGE_CACHE_SHIFT,
+			PAGE_CACHE_SIZE, 0);
+	}
+}
+
 /**
- * truncate_inode_pages - truncate range of pages specified by start and
+ * truncate_unmap_inode_pages_range - truncate range of pages specified by
+ * start and end byte offsets and optionally unmap them first.
  * end byte offsets
  * @mapping: mapping to truncate
  * @lstart: offset from which to truncate
  * @lend: offset to which to truncate
+ * @unmap: unmap whole truncated pages if non-zero
  *
  * Truncate the page cache, removing the pages that are between
  * specified offsets (and zeroing out partial page
- * (if lstart is not page aligned)).
+ * (if lstart is not page aligned)). If specified, unmap the pages
+ * before they are removed.
  *
  * Truncate takes two passes - the first pass is nonblocking.  It will not
  * block on page locks and it will not block on writeback.  The second pass
@@ -146,8 +164,8 @@ invalidate_complete_page(struct address_
  * mapping is large, it is probably the case that the final pages are the most
  * recently touched, and freeing happens in ascending file offset order.
  */
-void truncate_inode_pages_range(struct address_space *mapping,
-				loff_t lstart, loff_t lend)
+void truncate_unmap_inode_pages_range(struct address_space *mapping,
+				loff_t lstart, loff_t lend, int unmap)
 {
 	const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
 	pgoff_t end;
@@ -162,6 +180,14 @@ void truncate_inode_pages_range(struct a
 	BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
 	end = (lend >> PAGE_CACHE_SHIFT);
 
+	/*
+	 * if unmapping, do a range unmap up front to minimise the
+	 * overhead of unmapping the pages
+	 */
+	if (unmap) {
+		unmap_mapping_range(mapping, (loff_t)start << PAGE_CACHE_SHIFT,
+					   (loff_t)end << PAGE_CACHE_SHIFT, 0);
+	}
 	pagevec_init(&pvec, 0);
 	next = start;
 	while (next <= end &&
@@ -184,6 +210,8 @@ void truncate_inode_pages_range(struct a
 				unlock_page(page);
 				continue;
 			}
+			if (unmap)
+				unmap_single_page(mapping, page);
 			truncate_complete_page(mapping, page);
 			unlock_page(page);
 		}
@@ -195,6 +223,8 @@ void truncate_inode_pages_range(struct a
 		struct page *page = find_lock_page(mapping, start - 1);
 		if (page) {
 			wait_on_page_writeback(page);
+			if (unmap)
+				unmap_single_page(mapping, page);
 			truncate_partial_page(page, partial);
 			unlock_page(page);
 			page_cache_release(page);
@@ -224,12 +254,30 @@ void truncate_inode_pages_range(struct a
 			if (page->index > next)
 				next = page->index;
 			next++;
+			if (unmap)
+				unmap_single_page(mapping, page);
 			truncate_complete_page(mapping, page);
 			unlock_page(page);
 		}
 		pagevec_release(&pvec);
 	}
 }
+EXPORT_SYMBOL(truncate_unmap_inode_pages_range);
+
+/**
+ * truncate_inode_pages_range - truncate range of pages specified by start and
+ * end byte offsets
+ * @mapping: mapping to truncate
+ * @lstart: offset from which to truncate
+ * @lend: offset to which to truncate
+ *
+ * Called under (and serialised by) inode->i_mutex.
+ */
+void truncate_inode_pages_range(struct address_space *mapping,
+				loff_t lstart, loff_t lend)
+{
+	truncate_unmap_inode_pages_range(mapping, lstart, lend, 0);
+}
 EXPORT_SYMBOL(truncate_inode_pages_range);
 
 /**
@@ -241,7 +289,7 @@ EXPORT_SYMBOL(truncate_inode_pages_range
  */
 void truncate_inode_pages(struct address_space *mapping, loff_t lstart)
 {
-	truncate_inode_pages_range(mapping, lstart, (loff_t)-1);
+	truncate_unmap_inode_pages_range(mapping, lstart, (loff_t)-1, 0);
 }
 EXPORT_SYMBOL(truncate_inode_pages);
 
_

Patches currently in -mm which might be from dgc@xxxxxxx are

add-truncate_unmap_inode_pages_range.patch
xfs-remove-useless-wmb-memory-barrier.patch
make-bh_unwritten-a-first-class-bufferhead-flag-v2.patch
make-xfs-use-bh_unwritten-and-bh_delay-correctly.patch
sysctl-xfs-remove-unnecessary-insert_at_head-flag.patch
sysctl-c99-convert-xfs-ctl_tables.patch
sysctl-c99-convert-xfs-ctl_tables-fixes.patch

-
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Kernel Newbies FAQ]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux