Re: [MM Bug?] mmap() triggers SIGBUS while doing the numa_move_pages() for offlined hugepage in background

Mike Kravetz <mike.kravetz@xxxxxxxxxx> · Thu, 1 Aug 2019 17:19:41 -0700

On 7/30/19 5:44 PM, Mike Kravetz wrote:
> A SIGBUS is the normal behavior for a hugetlb page fault failure due to
> lack of huge pages.  Ugly, but that is the design.  I do not believe this
> test should not be experiencing this due to reservations taken at mmap
> time.  However, the test is combining faults, soft offline and page
> migrations, so the there are lots of moving parts.
> 
> I'll continue to investigate.

There appears to be a race with hugetlb_fault and try_to_unmap_one of
the migration path.

Can you try this patch in your environment?  I am not sure if it will
be the final fix, but just wanted to see if it addresses issue for you.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ede7e7f5d1ab..f3156c5432e3 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3856,6 +3856,20 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 
 		page = alloc_huge_page(vma, haddr, 0);
 		if (IS_ERR(page)) {
+			/*
+			 * We could race with page migration (try_to_unmap_one)
+			 * which is modifying page table with lock.  However,
+			 * we are not holding lock here.  Before returning
+			 * error that will SIGBUS caller, get ptl and make
+			 * sure there really is no entry.
+			 */
+			ptl = huge_pte_lock(h, mm, ptep);
+			if (!huge_pte_none(huge_ptep_get(ptep))) {
+				ret = 0;
+				spin_unlock(ptl);
+				goto out;
+			}
+			spin_unlock(ptl);
 			ret = vmf_error(PTR_ERR(page));
 			goto out;
 		}









Re: [MM Bug?] mmap() triggers SIGBUS while doing the​ ​numa_move_pages() for offlined hugepage in background

Re: [MM Bug?] mmap() triggers SIGBUS while doing the numa_move_pages() for offlined hugepage in background