Re: [RFC PATCH] docs/mm: add VMA locks documentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2024/11/7 02:09, Jann Horn wrote:
On Wed, Nov 6, 2024 at 4:09 AM Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> wrote:
On 2024/11/5 05:29, Jann Horn wrote:
On Mon, Nov 4, 2024 at 5:42 PM Lorenzo Stoakes

[...]


I think it's important to know about the existence of hardware writes
because it means you need atomic operations when making changes to
page tables. Like, for example, in many cases when changing a present
PTE, you can't even use READ_ONCE()/WRITE_ONCE() for PTEs and need
atomic RMW operations instead - see for example ptep_get_and_clear(),
which is basically implemented in arch code as an atomic xchg so that
it can't miss concurrent A/D bit updates.


Totally agree! But I noticed before that ptep_clear() doesn't seem
to need atomic operations because it doesn't need to care about the
A/D bit.

I once looked at the history of how the ptep_clear() was introduced.
If you are interested, you can take a look at my local draft below.
Maybe I missed something.

```
mm: pgtable: make ptep_clear() non-atomic

      In the generic ptep_get_and_clear() implementation, it is just a simple
      combination of ptep_get() and pte_clear(). But for some architectures
      (such as x86 and arm64, etc), the hardware will modify the A/D bits
of the
      page table entry, so the ptep_get_and_clear() needs to be overwritten
      and implemented as an atomic operation to avoid contention, which has a
      performance cost.

      The commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
      check") adds the ptep_clear() on the x86, and makes it call
      ptep_get_and_clear() when CONFIG_PAGE_TABLE_CHECK is enabled. The page
      table check feature does not actually care about the A/D bits, so only
      ptep_get() + pte_clear() should be called. But considering that the
page
      table check is a debug option, this should not have much of an impact.

      But then the commit de8c8e52836d ("mm: page_table_check: add hooks to
      public helpers") changed ptep_clear() to unconditionally call
      ptep_get_and_clear(), so that the  CONFIG_PAGE_TABLE_CHECK check can be
      put into the page table check stubs (in
include/linux/page_table_check.h).
      This also cause performance loss to the kernel without
      CONFIG_PAGE_TABLE_CHECK enabled, which doesn't make sense.

      To fix it, just calling ptep_get() and pte_clear() in the ptep_clear().

      Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 117b807e3f894..2ace92293f5f5 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -506,7 +506,10 @@ static inline void clear_young_dirty_ptes(struct
vm_area_struct *vma,
   static inline void ptep_clear(struct mm_struct *mm, unsigned long addr,
                                pte_t *ptep)
   {
-       ptep_get_and_clear(mm, addr, ptep);
+       pte_t pte = ptep_get(ptep);
+
+       pte_clear(mm, addr, ptep);
+       page_table_check_pte_clear(mm, pte);
   }

```

ptep_clear() is currently only used in debug code and in khugepaged
collapse paths, which are fairly expensive, so I don't think the cost
of an extra atomic RMW op should matter here; but yeah, the change
looks correct to me.

Thanks for double-checking it! And I agree that an extra atomic RMW op
is not a problem in the current call path. But this may be used for
other paths in the future. After all, for the present pte entry, we
need to call ptep_clear() instead of pte_clear() to ensure that
PAGE_TABLE_CHECK works properly.

Maybe this is worth sending a formal patch. ;)

Thanks!





[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux