On 2016/7/20 15:38, Michal Hocko wrote: > [CC Mike and Naoya] > On Tue 19-07-16 21:45:58, zhongjiang wrote: >> From: zhong jiang <zhongjiang@xxxxxxxxxx> >> >> I hit the following code in huge_pte_alloc when run the database and >> online-offline memory in the system. >> >> BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte)); >> >> when pmd share function enable, we may be obtain a shared pmd entry. >> due to ongoing offline memory , the pmd entry points to the page will >> turn into migrate condition. therefore, the bug will come up. >> >> The patch fix it by checking the pmd entry when we obtain the lock. >> if the shared pmd entry points to page is under migration. we should >> allocate a new pmd entry. > I am still not 100% sure this is correct. Does huge_pte_lockptr work > properly for the migration swapentry? If yes and we populate the pud > with a migration entry then is it really bad/harmful (other than hitting > the BUG_ON which might be update to handle that case)? This might be a > stupid question, sorry about that, but I have really problem to grasp > the whole issue properly and the changelog didn't help me much. I would > really appreciate some clarification here. The pmd sharing code is clear > as mud and adding new tweaks there doesn't sound like it would make it > more clear. ok, Maybe the following explain will better. cpu0 cpu1 try_to_unmap_one huge_pmd_share page_check_address huge_pte_lockptr spin_lock (page entry can be set to migrate or Posion ) pte_unmap_unlock spin_lock (page entry have changed) > Also is the hwpoison check really needed? Yes, page can be posion before spin_lock in try_to_unmap_one, so we can see that it will also set page entry to hwpoison entry if PageHWPoison(page) is true. >> Signed-off-by: zhong jiang <zhongjiang@xxxxxxxxxx> >> --- >> mm/hugetlb.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index 6384dfd..797db55 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -4213,7 +4213,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) >> struct vm_area_struct *svma; >> unsigned long saddr; >> pte_t *spte = NULL; >> - pte_t *pte; >> + pte_t *pte, entry; >> spinlock_t *ptl; >> >> if (!vma_shareable(vma, addr)) >> @@ -4240,6 +4240,11 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) >> >> ptl = huge_pte_lockptr(hstate_vma(vma), mm, spte); >> spin_lock(ptl); >> + entry = huge_ptep_get(spte); >> + if (is_hugetlb_entry_migration(entry) || >> + is_hugetlb_entry_hwpoisoned(entry)) { >> + goto out_unlock; >> + } >> if (pud_none(*pud)) { >> pud_populate(mm, pud, >> (pmd_t *)((unsigned long)spte & PAGE_MASK)); >> @@ -4247,6 +4252,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) >> put_page(virt_to_page(spte)); >> mm_dec_nr_pmds(mm); >> } >> + >> +out_unlock: >> spin_unlock(ptl); >> out: >> pte = (pte_t *)pmd_alloc(mm, pud, addr); >> -- >> 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>