On 4/27/24 20:37, Zi Yan wrote: > On 27 Apr 2024, at 0:25, John Hubbard wrote: > >> On 4/26/24 7:53 AM, Zi Yan wrote: >> >> Hi Zi (and Ryan)! >> >>>>>>> lockless pgtable walker could see the migration entry pmd in this state >>>>>>> and start interpretting the fields as if it were present, leading to >>>>>>> BadThings (TM). GUP-fast appears to be one such lockless pgtable walker. >>>>>> Could you please explain how bad things might happen ? >>>>> See 2 places where pmdp_get_lockless() is called in gup.c, without the PTL. >>>>> These could both return the swap pte for which pmd_mkinvalid() has been called. >>>>> In both cases, this would lead to the pmd_present() check eroneously returning >>>>> true, eventually causing incorrect interpretation of the pte fields. e.g.: >>>>> >>>>> gup_pmd_range() >>>>> pmd_t pmd = pmdp_get_lockless(pmdp); >>>>> gup_huge_pmd(pmd, ...) >>>>> page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT); >>>>> >>>>> page is guff. >>>>> >>>>> Let me know what you think! >>> Add JohnH to check GUP code. >> Ryan is correct about this behavior. >> >> By the way, remember that gup is not the only lockless page table >> walker: there is also the CPU hardware itself, which inconveniently >> refuses to bother with taking page table locks. 🙂 >> >> So if we have code that can make a non-present PTE appear to be present >> to any of these page walkers, whether software or hardware, it's a >> definitely Not Good and will lead directly to bugs. > This issue does not bother hardware, because the PTE_VALID/PMD_SECT_VALID > is always unset and hardware always sees this PMD as invalid. It is a pure > software issue, since for THP splitting, we do not want hardware to access > the page but still allow kernel to user pmd_page() to get the pfn, so > pmd_present() returns true even if PTE_VALID/PMD_SECT_VALID is unset by > setting and checking PMD_PRESENT_INVALID bit. pmd_mkinvalid() sets > PMD_PRESENT_INVALID, turning a migration entry from !pmd_present() to > pmd_present(), while it is always a invalid PMD to hardware. Agreed, this is not a HW issue at all, MMU sees such an entry as invalid even if pmd_present() returns true.