On 2024/8/6 11:31, Qi Zheng wrote:
Hi all,
On 2024/8/5 20:55, Qi Zheng wrote:
[...]
2. When we use mmu_gather to batch flush tlb and free PTE pages, the
TLB is not
flushed before pmd lock is unlocked. This may result in the
following two
situations:
1) Userland can trigger page fault and fill a huge page, which
will cause
the existence of small size TLB and huge TLB for the same address.
2) Userland can also trigger page fault and fill a PTE page, which
will
cause the existence of two small size TLBs, but the PTE page
they map
are different.
For case 1), according to Intel's TLB Application note (317080),
some CPUs of
x86 do not allow it:
```
If software modifies the paging structures so that the page size
used for a
4-KByte range of linear addresses changes, the TLBs may
subsequently contain
both ordinary and large-page translations for the address range.12
A reference
to a linear address in the address range may use either
translation. Which of
the two translations is used may vary from one execution to
another and the
choice may be implementation-specific.
Software wishing to prevent this uncertainty should not write to a
paging-
structure entry in a way that would change, for any linear
address, both the
page size and either the page frame or attributes. It can instead
use the
following algorithm: first mark the relevant paging-structure
entry (e.g.,
PDE) not present; then invalidate any translations for the
affected linear
addresses (see Section 5.2); and then modify the relevant
paging-structure
entry to mark it present and establish translation(s) for the new
page size.
```
We can also learn more information from the comments above
pmdp_invalidate()
in __split_huge_pmd_locked().
For case 2), we can see from the comments above ptep_clear_flush() in
wp_page_copy() that this situation is also not allowed. Even without
this patch series, madvise(MADV_DONTNEED) can also cause this
situation:
CPU 0 CPU 1
madvise (MADV_DONTNEED)
--> clear pte entry
pte_unmap_unlock
touch and tlb miss
--> set pte entry
mmu_gather flush tlb
But strangely, I didn't see any relevant fix code, maybe I missed
something,
or is this guaranteed by userland?
I'm still quite confused about this, is there anyone who is familiar
with this part?
This is not a new issue introduced by this patch series, and I have
sent a separate RFC patch [1] to track this issue.
I will remove this part of the handling in the next version.
[1].
https://lore.kernel.org/lkml/20240815120715.14516-1-zhengqi.arch@xxxxxxxxxxxxx/
Thanks,
Qi
Anyway, this series defines the following two functions to be
implemented by
the architecture. If the architecture does not allow the above two
situations,
then define these two functions to flush the tlb before set_pmd_at().
- arch_flush_tlb_before_set_huge_page
- arch_flush_tlb_before_set_pte_page
[...]