On 2020-04-24 at 20:57 Jason Gunthorpe wrote: >On Fri, Apr 24, 2020 at 12:07:50PM +0800, Li Xinhai wrote: >> On 2020-04-24 at 02:38 Jason Gunthorpe wrote: >> >On Thu, Apr 23, 2020 at 11:14:28AM -0700, Mike Kravetz wrote: >> >> Cc a few people who have looked at huge_pte_offset() recently. >> >> >> >> On 4/23/20 5:49 AM, Li Xinhai wrote: >> >> > When huge_pte_offset() is called, the parameter sz can only be PUD_SIZE >> >> > or PMD_SIZE. >> >> > If sz is PUD_SIZE and code can reach pud, then *pud must be none, or >> >> > normal hugetlb entry, or non-present (migration or hwpoisoned) hugetlb >> >> > entry, and we can directly return pud. >> >> > When sz is PMD_SIZE, pud must be none or present, and if code can reach >> >> > pmd, we can directly return pmd. >> >> > >> >> > So, after this patch, the code is simplified by first check on the >> >> > parameter sz, and avoid unnecessary checks in current code. >> >> > >> >> > Signed-off-by: Li Xinhai <lixinhai.lxh@xxxxxxxxx> >> >> > Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> >> >> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >> >> > mm/hugetlb.c | 24 +++++++++--------------- >> >> > 1 file changed, 9 insertions(+), 15 deletions(-) >> >> > >> >> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> >> > index bcabbe0..e1424f5 100644 >> >> > +++ b/mm/hugetlb.c >> >> > @@ -5365,8 +5365,8 @@ pte_t *huge_pte_offset(struct mm_struct *mm, >> >> > { >> >> > pgd_t *pgd; >> >> > p4d_t *p4d; >> >> > - pud_t *pud, pud_entry; >> >> > - pmd_t *pmd, pmd_entry; >> >> > + pud_t *pud; >> >> > + pmd_t *pmd; >> >> > >> >> > pgd = pgd_offset(mm, addr); >> >> > if (!pgd_present(*pgd)) >> >> > @@ -5376,22 +5376,16 @@ pte_t *huge_pte_offset(struct mm_struct *mm, >> >> > return NULL; >> >> > >> >> > pud = pud_offset(p4d, addr); >> >> > - pud_entry = READ_ONCE(*pud); >> >> > - if (sz != PUD_SIZE && pud_none(pud_entry)) >> >> > - return NULL; >> >> > - /* hugepage or swap? */ >> >> > - if (pud_huge(pud_entry) || !pud_present(pud_entry)) >> >> > + if (sz == PUD_SIZE) >> >> > + /* must be pud_huge or pud_none */ >> >> > return (pte_t *)pud; >> >> > - >> >> > - pmd = pmd_offset(pud, addr); >> >> > - pmd_entry = READ_ONCE(*pmd); >> >> > - if (sz != PMD_SIZE && pmd_none(pmd_entry)) >> >> > + if (!pud_present(*pud)) >> >> > return NULL; >> >> > - /* hugepage or swap? */ >> >> > - if (pmd_huge(pmd_entry) || !pmd_present(pmd_entry)) >> >> > - return (pte_t *)pmd; >> >> > + /* must have a valid entry and size to go further */ >> >> > >> >> > - return NULL; >> >> > + pmd = pmd_offset(pud, addr); >> >> >> >> Can we get here with sz = PMD_SIZE and pud_none(*pud)? Would that be >> >> an issue for the pmd_offset() call? >> > >> >Certainly pmd_offset() must only be called if the PUD entry is >> >pointing at a pmd level. >> > >> >AFAIK this means it should not be called on pud_none(), pud_huge() or >> >!pud_present() cases. >> >> The test of !pud_present(*pud) also block pud_none(*pud) > >Sure > >> , so when sz == PMD_SIZE, pmd_offset() only called with a valid PUD >> entry which point to PMD page table. > >But what prevents pud_huge? > if sz == PUD_SIZE, the 'return (pte_t*)pud' alrady end the function, which cover pud_huge() and pud_none(), because we the mapping is for PUD_SIZE huge page. So, there is no possibility for pmd_offset() been called with invalid pud entry. Below is the code I used for test which has BUG_ON, that should give more clear idea about the semantics of code path: ... pud = pud_offset(p4d, addr); if (sz == PUD_SIZE) { /* must be pud_huge or pud_none */ BUG_ON(!pud_huge(*pud) && !pud_none(*pud)); return (pte_t *)pud; // note that return valid pointer for pud_none() case, // instead of NULL, that is same semantics as existing code. } if (!pud_present(*pud)) return NULL; // note that only return NULL in case pud not present, // same sematics as existing code. /* must have a valid entry and size to go further */ BUG_ON(sz != PMD_SIZE); pmd = pmd_offset(pud, addr); /* must be pmd_huge or pmd_none */ BUG_ON(!pmd_huge(*pmd) && !pmd_none(*pmd)); return (pte_t *)pmd; // note that return valid pointer for pmd_none() case, // instead of NULL, that is same semantics as existing code. ... >This API seems kind of strange to be honest.. Should it be two >functions instead of a sz parameter? > >huge_pud_offset() and huge_pmd_offset() ? I think checking huge size then call to one of these two functions at caller site will involve many redundant code do branch work in one function is better. > >Jason