On Tue 06-08-19 17:07:25, Mike Kravetz wrote: > On 8/5/19 10:36 AM, Mike Kravetz wrote: > >>>>> Can you try this patch in your environment? I am not sure if it will > >>>>> be the final fix, but just wanted to see if it addresses issue for you. > >>>>> > >>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c > >>>>> index ede7e7f5d1ab..f3156c5432e3 100644 > >>>>> --- a/mm/hugetlb.c > >>>>> +++ b/mm/hugetlb.c > >>>>> @@ -3856,6 +3856,20 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, > >>>>> > >>>>> page = alloc_huge_page(vma, haddr, 0); > >>>>> if (IS_ERR(page)) { > >>>>> + /* > >>>>> + * We could race with page migration (try_to_unmap_one) > >>>>> + * which is modifying page table with lock. However, > >>>>> + * we are not holding lock here. Before returning > >>>>> + * error that will SIGBUS caller, get ptl and make > >>>>> + * sure there really is no entry. > >>>>> + */ > >>>>> + ptl = huge_pte_lock(h, mm, ptep); > >>>>> + if (!huge_pte_none(huge_ptep_get(ptep))) { > >>>>> + ret = 0; > >>>>> + spin_unlock(ptl); > >>>>> + goto out; > >>>>> + } > >>>>> + spin_unlock(ptl); > >>>> > >>>> Thanks you for investigation, Mike. > >>>> I tried this change and found no SIGBUS, so it works well. > > Here is another way to address the issue. Take the hugetlb fault mutex in > the migration code when modifying the page tables. IIUC, the fault mutex > was introduced to prevent this same issue when there were two page faults > on the same page (and we were unable to allocate an 'extra' page). The > downside to such an approach is that we add more hugetlbfs specific code > to try_to_unmap_one. I would rather go with the hugetlb_no_page which is better isolated. -- Michal Hocko SUSE Labs