Re: [RFC PATCH v2 0/7] synchronously scan and reclaim empty user PTE pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

On 2024/8/5 20:55, Qi Zheng wrote:

[...]


2. When we use mmu_gather to batch flush tlb and free PTE pages, the TLB is not
    flushed before pmd lock is unlocked. This may result in the following two
    situations:

    1) Userland can trigger page fault and fill a huge page, which will cause
       the existence of small size TLB and huge TLB for the same address.

    2) Userland can also trigger page fault and fill a PTE page, which will
       cause the existence of two small size TLBs, but the PTE page they map
       are different.

    For case 1), according to Intel's TLB Application note (317080), some CPUs of
    x86 do not allow it:

    ```
    If software modifies the paging structures so that the page size used for a
    4-KByte range of linear addresses changes, the TLBs may subsequently contain
    both ordinary and large-page translations for the address range.12 A reference
    to a linear address in the address range may use either translation. Which of
    the two translations is used may vary from one execution to another and the
    choice may be implementation-specific.

    Software wishing to prevent this uncertainty should not write to a paging-
    structure entry in a way that would change, for any linear address, both the
    page size and either the page frame or attributes. It can instead use the
    following algorithm: first mark the relevant paging-structure entry (e.g.,
    PDE) not present; then invalidate any translations for the affected linear
    addresses (see Section 5.2); and then modify the relevant paging-structure
    entry to mark it present and establish translation(s) for the new page size.
    ```

    We can also learn more information from the comments above pmdp_invalidate()
    in __split_huge_pmd_locked().

    For case 2), we can see from the comments above ptep_clear_flush() in
    wp_page_copy() that this situation is also not allowed. Even without
    this patch series, madvise(MADV_DONTNEED) can also cause this situation:

            CPU 0                         CPU 1

    madvise (MADV_DONTNEED)
    -->  clear pte entry
         pte_unmap_unlock
                                       touch and tlb miss
				      --> set pte entry
         mmu_gather flush tlb

    But strangely, I didn't see any relevant fix code, maybe I missed something,
    or is this guaranteed by userland?

I'm still quite confused about this, is there anyone who is familiar
with this part?

Thanks,
Qi


    Anyway, this series defines the following two functions to be implemented by
    the architecture. If the architecture does not allow the above two situations,
    then define these two functions to flush the tlb before set_pmd_at().

    - arch_flush_tlb_before_set_huge_page
    - arch_flush_tlb_before_set_pte_page


[...]






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux