+ hugetlbfs-fix-hugetlb-page-migration-fault-race-causing-sigbus.patch added to -mm tree

akpm@xxxxxxxxxxxxxxxxxxxx · Thu, 08 Aug 2019 16:41:45 -0700

The patch titled
     Subject: hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS
has been added to the -mm tree.  Its filename is
     hugetlbfs-fix-hugetlb-page-migration-fault-race-causing-sigbus.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/hugetlbfs-fix-hugetlb-page-migration-fault-race-causing-sigbus.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/hugetlbfs-fix-hugetlb-page-migration-fault-race-causing-sigbus.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
Subject: hugetlbfs: fix hugetlb page migration/fault race causing SIGBUS

Li Wang discovered that LTP/move_page12 V2 sometimes triggers SIGBUS in
the kernel-v5.2.3 testing.  This is caused by a race between hugetlb page
migration and page fault.

If a hugetlb page can not be allocated to satisfy a page fault, the task
is sent SIGBUS.  This is normal hugetlbfs behavior.  A hugetlb fault mutex
exists to prevent two tasks from trying to instantiate the same page. 
This protects against the situation where there is only one hugetlb page,
and both tasks would try to allocate.  Without the mutex, one would fail
and SIGBUS even though the other fault would be successful.

There is a similar race between hugetlb page migration and fault. 
Migration code will allocate a page for the target of the migration.  It
will then unmap the original page from all page tables.  It does this
unmap by first clearing the pte and then writing a migration entry.  The
page table lock is held for the duration of this clear and write
operation.  However, the beginnings of the hugetlb page fault code
optimistically checks the pte without taking the page table lock.  If
clear (as it can be during the migration unmap operation), a hugetlb page
allocation is attempted to satisfy the fault.  Note that the page which
will eventually satisfy this fault was already allocated by the migration
code.  However, the allocation within the fault path could fail which
would result in the task incorrectly being sent SIGBUS.

Ideally, we could take the hugetlb fault mutex in the migration code when
modifying the page tables.  However, locks must be taken in the order of
hugetlb fault mutex, page lock, page table lock.  This would require
significant rework of the migration code.  Instead, the issue is addressed
in the hugetlb fault code.  After failing to allocate a huge page, take
the page table lock and check for huge_pte_none before returning an error.
This is the same check that must be made further in the code even if page
allocation is successful.

Link: http://lkml.kernel.org/r/20190808000533.7701-1-mike.kravetz@xxxxxxxxxx
Fixes-no-stable: 290408d4a250 ("hugetlb: hugepage migration core")
Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
Reported-by: Li Wang <liwang@xxxxxxxxxx>
Tested-by: Li Wang <liwang@xxxxxxxxxx>
Reviewed-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Acked-by: Michal Hocko <mhocko@xxxxxxxx>
Cc: Cyril Hrubis <chrubis@xxxxxxx>
Cc: Xishi Qiu <xishi.qiuxishi@xxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/hugetlb.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

--- a/mm/hugetlb.c~hugetlbfs-fix-hugetlb-page-migration-fault-race-causing-sigbus
+++ a/mm/hugetlb.c
@@ -3856,6 +3856,25 @@ retry:
 
 		page = alloc_huge_page(vma, haddr, 0);
 		if (IS_ERR(page)) {
+			/*
+			 * Returning error will result in faulting task being
+			 * sent SIGBUS.  The hugetlb fault mutex prevents two
+			 * tasks from racing to fault in the same page which
+			 * could result in false unable to allocate errors.
+			 * Page migration does not take the fault mutex, but
+			 * does a clear then write of pte's under page table
+			 * lock.  Page fault code could race with migration,
+			 * notice the clear pte and try to allocate a page
+			 * here.  Before returning error, get ptl and make
+			 * sure there really is no pte entry.
+			 */
+			ptl = huge_pte_lock(h, mm, ptep);
+			if (!huge_pte_none(huge_ptep_get(ptep))) {
+				ret = 0;
+				spin_unlock(ptl);
+				goto out;
+			}
+			spin_unlock(ptl);
 			ret = vmf_error(PTR_ERR(page));
 			goto out;
 		}
_

Patches currently in -mm which might be from mike.kravetz@xxxxxxxxxx are

hugetlbfs-fix-hugetlb-page-migration-fault-race-causing-sigbus.patch
hugetlbfs-dont-retry-when-pool-page-allocations-start-to-fail.patch