On 08/18/2020 02:43 PM, Jonathan Cameron wrote: > On Mon, 17 Aug 2020 14:49:43 +0530 > Anshuman Khandual <anshuman.khandual@xxxxxxx> wrote: > >> pmd_present() and pmd_trans_huge() are expected to behave in the following >> manner during various phases of a given PMD. It is derived from a previous >> detailed discussion on this topic [1] and present THP documentation [2]. >> >> pmd_present(pmd): >> >> - Returns true if pmd refers to system RAM with a valid pmd_page(pmd) >> - Returns false if pmd does not refer to system RAM - Invalid pmd_page(pmd) >> >> pmd_trans_huge(pmd): >> >> - Returns true if pmd refers to system RAM and is a trans huge mapping >> >> ------------------------------------------------------------------------- >> | PMD states | pmd_present | pmd_trans_huge | >> ------------------------------------------------------------------------- >> | Mapped | Yes | Yes | >> ------------------------------------------------------------------------- >> | Splitting | Yes | Yes | >> ------------------------------------------------------------------------- >> | Migration/Swap | No | No | >> ------------------------------------------------------------------------- >> >> The problem: >> >> PMD is first invalidated with pmdp_invalidate() before it's splitting. This >> invalidation clears PMD_SECT_VALID as below. >> >> PMD Split -> pmdp_invalidate() -> pmd_mkinvalid -> Clears PMD_SECT_VALID >> >> Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false >> on the PMD entry. It will need another bit apart from PMD_SECT_VALID to re- >> affirm pmd_present() as true during the THP split process. To comply with >> above mentioned semantics, pmd_trans_huge() should also check pmd_present() >> first before testing presence of an actual transparent huge mapping. >> >> The solution: >> >> Ideally PMD_TYPE_SECT should have been used here instead. But it shares the >> bit position with PMD_SECT_VALID which is used for THP invalidation. Hence >> it will not be there for pmd_present() check after pmdp_invalidate(). >> >> A new software defined PMD_PRESENT_INVALID (bit 59) can be set on the PMD >> entry during invalidation which can help pmd_present() return true and in >> recognizing the fact that it still points to memory. >> >> This bit is transient. During the split process it will be overridden by a >> page table page representing normal pages in place of erstwhile huge page. >> Other pmdp_invalidate() callers always write a fresh PMD value on the entry >> overriding this transient PMD_PRESENT_INVALID bit, which makes it safe. >> >> [1]: https://lkml.org/lkml/2018/10/17/231 >> [2]: https://www.kernel.org/doc/Documentation/vm/transhuge.txt > > Hi Anshuman, > > One query on this. From my reading of the ARM ARM, bit 59 is not > an ignored bit. The exact requirements for hardware to be using > it are a bit complex though. > > It 'might' be safe to use it for this, but if so can we have a comment > explaining why. Also more than possible I'm misunderstanding things! We are using this bit 59 only when the entry is not active from MMU perspective i.e PMD_SECT_VALID is clear.