+ tmpfs-fix-shmem_evict_inode-warnings-on-i_blocks.patch added to -mm tree

akpm@xxxxxxxxxxxxxxxxxxxx · Wed, 02 Dec 2015 13:50:16 -0800

The patch titled
     Subject: tmpfs: fix shmem_evict_inode() warnings on i_blocks
has been added to the -mm tree.  Its filename is
     tmpfs-fix-shmem_evict_inode-warnings-on-i_blocks.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/tmpfs-fix-shmem_evict_inode-warnings-on-i_blocks.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/tmpfs-fix-shmem_evict_inode-warnings-on-i_blocks.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Hugh Dickins <hughd@xxxxxxxxxx>
Subject: tmpfs: fix shmem_evict_inode() warnings on i_blocks

Dmitry Vyukov provides a little program, autogenerated by syzkaller, which
races a fault on a mapping of a sparse memfd object, against truncation of
that object below the fault address: run repeatedly for a few minutes, it
reliably generates shmem_evict_inode()'s WARN_ON(inode->i_blocks).

(But there's nothing specific to memfd here, nor to the fstat which it
happened to use to generate the fault: though that looked suspicious,
since a shmem_recalc_inode() had been added there recently.  The same
problem can be reproduced with open+unlink in place of memfd_create, and
with fstatfs in place of fstat.)

v3.7 commit 0f3c42f522dc ("tmpfs: change final i_blocks BUG to WARNING")
explains one cause of such a warning (a race with shmem_writepage to
swap), and possible solutions; but we never took it further, and this
syzkaller incident turns out to have a different cause.

shmem_getpage_gfp()'s error recovery, when a freshly allocated page is
then found to be beyond eof, looks plausible - decrementing the alloced
count that was just before incremented - but in fact can go wrong, if a
racing thread (the truncator, for example) gets its shmem_recalc_inode()
in just after our delete_from_page_cache().  delete_from_page_cache()
decrements nrpages, that shmem_recalc_inode() will balance the books by
decrementing alloced itself, then our decrement of alloced take it one too
low: leading to the WARNING when the object is finally evicted.

Once the new page has been exposed in the page cache, shmem_getpage_gfp()
must leave it to shmem_recalc_inode() itself to get the accounting right
in all cases (and not fall through from "trunc:" to "decused:").  Adjust
that error recovery block; and the reinitialization of info and sbinfo can
be removed too.

While we're here, fix shmem_writepage() to avoid the original issue: it
will be safe against a racing shmem_recalc_inode(), if it merely
increments swapped before the shmem_delete_from_page_cache() which
decrements nrpages (but it must then do its own shmem_recalc_inode()
before that, while still in balance, instead of after).  (Aside: why do we
shmem_recalc_inode() here in the swap path?  Because its raison d'etre is
to cope with clean sparse shmem pages being reclaimed behind our back: so
here when swapping is a good place to look for that case.) But I've not
now managed to reproduce this bug, even without the patch.

I don't see why I didn't do that earlier: perhaps inhibited by the
preference to eliminate shmem_recalc_inode() altogether.  Driven by this
incident, I do now have a patch to do so at last; but still want to sit on
it for a bit, there's a couple of questions yet to be resolved.

Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
Reported-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/shmem.c |   34 ++++++++++++++--------------------
 1 file changed, 14 insertions(+), 20 deletions(-)

diff -puN mm/shmem.c~tmpfs-fix-shmem_evict_inode-warnings-on-i_blocks mm/shmem.c

--- a/mm/shmem.c~tmpfs-fix-shmem_evict_inode-warnings-on-i_blocks
+++ a/mm/shmem.c
@@ -843,14 +843,14 @@ static int shmem_writepage(struct page *
 		list_add_tail(&info->swaplist, &shmem_swaplist);
 
 	if (add_to_swap_cache(page, swap, GFP_ATOMIC) == 0) {
-		swap_shmem_alloc(swap);
-		shmem_delete_from_page_cache(page, swp_to_radix_entry(swap));
-
 		spin_lock(&info->lock);
-		info->swapped++;
 		shmem_recalc_inode(inode);
+		info->swapped++;
 		spin_unlock(&info->lock);
 
+		swap_shmem_alloc(swap);
+		shmem_delete_from_page_cache(page, swp_to_radix_entry(swap));
+
 		mutex_unlock(&shmem_swaplist_mutex);
 		BUG_ON(page_mapped(page));
 		swap_writepage(page, wbc);
@@ -1078,7 +1078,7 @@ repeat:
 	if (sgp != SGP_WRITE && sgp != SGP_FALLOC &&
 	    ((loff_t)index << PAGE_CACHE_SHIFT) >= i_size_read(inode)) {
 		error = -EINVAL;
-		goto failed;
+		goto unlock;
 	}
 
 	if (page && sgp == SGP_WRITE)
@@ -1246,11 +1246,15 @@ clear:
 	/* Perhaps the file has been truncated since we checked */
 	if (sgp != SGP_WRITE && sgp != SGP_FALLOC &&
 	    ((loff_t)index << PAGE_CACHE_SHIFT) >= i_size_read(inode)) {
+		if (alloced) {
+			ClearPageDirty(page);
+			delete_from_page_cache(page);
+			spin_lock(&info->lock);
+			shmem_recalc_inode(inode);
+			spin_unlock(&info->lock);
+		}
 		error = -EINVAL;
-		if (alloced)
-			goto trunc;
-		else
-			goto failed;
+		goto unlock;
 	}
 	*pagep = page;
 	return 0;
@@ -1258,23 +1262,13 @@ clear:
 	/*
 	 * Error recovery.
 	 */
-trunc:
-	info = SHMEM_I(inode);
-	ClearPageDirty(page);
-	delete_from_page_cache(page);
-	spin_lock(&info->lock);
-	info->alloced--;
-	inode->i_blocks -= BLOCKS_PER_PAGE;
-	spin_unlock(&info->lock);
 decused:
-	sbinfo = SHMEM_SB(inode->i_sb);
 	if (sbinfo->max_blocks)
 		percpu_counter_add(&sbinfo->used_blocks, -1);
 unacct:
 	shmem_unacct_blocks(info->flags, 1);
 failed:
-	if (swap.val && error != -EINVAL &&
-	    !shmem_confirm_swap(mapping, index, swap))
+	if (swap.val && !shmem_confirm_swap(mapping, index, swap))
 		error = -EEXIST;
 unlock:
 	if (page) {
_

Patches currently in -mm which might be from hughd@xxxxxxxxxx are

osd-fs-__r4w_get_page-rely-on-pageuptodate-for-uptodate.patch
tmpfs-fix-shmem_evict_inode-warnings-on-i_blocks.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html