On Tue, Nov 19, 2024 at 4:10 AM Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> wrote: > > In the generic ptep_get_and_clear() implementation, it is just a simple > combination of ptep_get() and pte_clear(). But for some architectures > (such as x86 and arm64, etc), the hardware will modify the A/D bits of the > page table entry, so the ptep_get_and_clear() needs to be overwritten > and implemented as an atomic operation to avoid contention, which has a > performance cost. > > The commit d283d422c6c4 ("x86: mm: add x86_64 support for page table > check") adds the ptep_clear() on the x86, and makes it call > ptep_get_and_clear() when CONFIG_PAGE_TABLE_CHECK is enabled. The page > table check feature does not actually care about the A/D bits, so only > ptep_get() + pte_clear() should be called. But considering that the page > table check is a debug option, this should not have much of an impact. > > But then the commit de8c8e52836d ("mm: page_table_check: add hooks to > public helpers") changed ptep_clear() to unconditionally call > ptep_get_and_clear(), so that the CONFIG_PAGE_TABLE_CHECK check can be > put into the page table check stubs (in include/linux/page_table_check.h). > This also cause performance loss to the kernel without > CONFIG_PAGE_TABLE_CHECK enabled, which doesn't make sense. > > Currently ptep_clear() is only used in debug code and in khugepaged > collapse paths, which are fairly expensive. So the cost of an extra atomic > RMW operation does not matter. But this may be used for other paths in the > future. After all, for the present pte entry, we need to call ptep_clear() > instead of pte_clear() to ensure that PAGE_TABLE_CHECK works properly. > > So to be more precise, just calling ptep_get() and pte_clear() in the > ptep_clear(). > > Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> > --- > include/linux/pgtable.h | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index adef9d6e9b1ba..e59decd22e1cb 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -533,7 +533,10 @@ static inline void clear_young_dirty_ptes(struct vm_area_struct *vma, > static inline void ptep_clear(struct mm_struct *mm, unsigned long addr, > pte_t *ptep) > { > - ptep_get_and_clear(mm, addr, ptep); > + pte_t pte = ptep_get(ptep); > + > + pte_clear(mm, addr, ptep); > + page_table_check_pte_clear(mm, pte); > } Reviewed-by: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> Thanks, Pasha