On Tuesday 13 October 2015 09:37 PM, Kirill A. Shutemov wrote: > On Tue, Oct 13, 2015 at 01:58:39PM +0000, Vineet Gupta wrote: >> Hi Kirill, >> >> I'm running LTP tests on the new ARC THP code and thp03 seems to be triggering mm >> spew. >> >> --------------->8--------------------- >> [ARCLinux]# ./ltp-thp03-extract >> PID 60 >> bad pmd bf1c4600 be600231 >> ../mm/pgtable-generic.c:34: bad pgd be600231. >> bad pmd bf1c4604 bd800231 >> ../mm/pgtable-generic.c:34: bad pgd bd800231. >> BUG: Bad rss-counter state mm:bf12e900 idx:1 val:512 >> BUG: non-zero nr_ptes on freeing mm: 2 >> --------------->8--------------------- >> >> I know what exactly is happening and the likely fix, but would want to get some >> thoughts from you if possible. >> >> background: ARC is software page walked with PGD -> PTE -> page for normal and PMD >> -> page for THP case. A vanilla PGD doesn't have any flags - only pointer to PTE >> >> A reduced version of thp03 allocates a THP, dirties it, followed by >> mprotect(PROT_NONE). >> At the time of mprotect() -> change_huge_pmd() -> pmd_modify() needs to change >> some of the bits. >> >> The issue is ARC implementation of pmd_modify() based on pte variant, which >> retains the soft pte bits (dirty and accessed). >> >> static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) >> { >> return pte_pmd(pte_modify(pmd_pte(pmd), newprot)); >> } >> >> Obvious fix is to rewrite pmd_modify() so that it clears out all pte type flags >> but that assumes PMD is becoming PGD (a vanilla PGD on ARC doesn't have any >> flags). Can we have pmd_modify() ever be called for NOT splitting pmd e.g. >> mprotect Write to Read which won't split the THP like it does now and simply >> changes the prot flags. My proposed version of pmd_modify() will loose the dirty bit. > Hm? pmd_modify() is nothing to do with splitting. The mprotect() codepath > you've mentioned above calls pmd_modify() only if the THP is fully in > mprotect range. Indeed my mental picture of this was messed up - specially because behind the back, pmd_modify() for ARC (based on pte_modify()) was buggered to clear the huge page bit itself :-) So we had a THP PMD which would start failing for pmd_trans_huge() and thus treated like a normal PGD. But it had the leftover PMD soft bits, which triggered the MM spew. The localized fix is below, while better fix is to make pte_modify() only clear R-W-X bits (currently it clears everything except soft accessed/dirty bits) static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) { - return pte_pmd(pte_modify(pmd_pte(pmd), newprot)); + /* + * open-coded pte_modify() additionally retaining HW_SZ bit + * otherwise, pmd_trans_huge() checks start failing + */ + return __pmd((pmd_val(pmd) & (_PAGE_CHG_MASK | _PAGE_HW_SZ)) | pgprot_val(newprot)); } > >> In short, what are the semantics of pmd_modify() - essentially does it imply pmd >> is being split so are free to make it like PGD. > No, pmd_modify() cannot make such assumption. That's just not true -- we > don't split PMD in such codepath. And even if we do, we construct new PMD > entry from scratch instead of modifying existing one. > > So the semantics of pmd_modify(): you can assume that the entry is > pmd_large(), going to stay this way and you need to touch only > protection-related bit. Thx ! -Vineet -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href