+ mm-migrate-dont-rely-on-pagemovable-of-newpage-after-unlocking-it.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm: migrate: don't rely on PageMovable() of newpage after unlocking it
has been added to the -mm tree.  Its filename is
     mm-migrate-dont-rely-on-pagemovable-of-newpage-after-unlocking-it.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-migrate-dont-rely-on-pagemovable-of-newpage-after-unlocking-it.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-migrate-dont-rely-on-pagemovable-of-newpage-after-unlocking-it.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@xxxxxxxxxx>
Subject: mm: migrate: don't rely on PageMovable() of newpage after unlocking it

While debugging some crashes related to virtio-balloon deflation that
happened under the old balloon migration code, I stumbled over a race that
still exists today.

What we experienced:

drivers/virtio/virtio_balloon.c:release_pages_balloon():
- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100

Turns out after having added the page to a local list when dequeuing, the
page would suddenly be moved to an LRU list before we would free it via
the local list, corrupting both lists.  So a page we own and that is !LRU
was moved to an LRU list.

In __unmap_and_move(), we lock the old and newpage and perform the
migration.  In case of vitio-balloon, the new page will become movable,
the old page will no longer be movable.

However, after unlocking newpage, there is nothing stopping the newpage
from getting dequeued and freed by virtio-balloon. This
will result in the newpage
1. No longer having PageMovable()
2. Getting moved to the local list before finally freeing it (using
   page->lru)

Back in the migration thread in __unmap_and_move(), we would after
unlocking the newpage suddenly no longer have PageMovable(newpage) and
will therefore call putback_lru_page(newpage), modifying page->lru
although that list is still in use by virtio-balloon.

To summarize, we have a race between migrating the newpage and checking
for PageMovable(newpage).  Instead of checking PageMovable(newpage), we
can simply rely on is_lru of the original page.

Looks like this was introduced by d6d86c0a7f8d ("mm/balloon_compaction:
redesign ballooned pages management"), which was backported up to 3.12. 
Old compaction code used PageBalloon() via -_is_movable_balloon_page()
instead of PageMovable(), however with the same semantics.

Link: http://lkml.kernel.org/r/20190128160403.16657-1-david@xxxxxxxxxx
Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management")
Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
Reported-by: Vratislav Bendel <vbendel@xxxxxxxxxx>
Acked-by: Michal Hocko <mhocko@xxxxxxxx>
Acked-by: Rafael Aquini <aquini@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Cc: Jan Kara <jack@xxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Dominik Brodowski <linux@xxxxxxxxxxxxxxxxxxxx>
Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Cc: Konstantin Khlebnikov <k.khlebnikov@xxxxxxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>	[3.12+]
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/migrate.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/mm/migrate.c~mm-migrate-dont-rely-on-pagemovable-of-newpage-after-unlocking-it
+++ a/mm/migrate.c
@@ -1135,10 +1135,12 @@ out:
 	 * If migration is successful, decrease refcount of the newpage
 	 * which will not free the page because new page owner increased
 	 * refcounter. As well, if it is LRU page, add the page to LRU
-	 * list in here.
+	 * list in here. Don't rely on PageMovable(newpage), as that could
+	 * already have changed after unlocking newpage (e.g.
+	 * virtio-balloon deflation).
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
-		if (unlikely(__PageMovable(newpage)))
+		if (unlikely(!is_lru))
 			put_page(newpage);
 		else
 			putback_lru_page(newpage);
_

Patches currently in -mm which might be from david@xxxxxxxxxx are

mm-balloon-update-comment-about-isolation-migration-compaction.patch
mm-convert-pg_balloon-to-pg_offline.patch
kexec-export-pg_offline-to-vmcoreinfo.patch
xen-balloon-mark-inflated-pages-pg_offline.patch
hv_balloon-mark-inflated-pages-pg_offline.patch
vmw_balloon-mark-inflated-pages-pg_offline.patch
vmw_balloon-mark-inflated-pages-pg_offline-v2.patch
pm-hibernate-use-pfn_to_online_page.patch
pm-hibernate-exclude-all-pageoffline-pages.patch
pm-hibernate-exclude-all-pageoffline-pages-v2.patch
mm-migrate-dont-rely-on-pagemovable-of-newpage-after-unlocking-it.patch




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux