My appologies for the noise: A blank line between Cc and Subject has broken the subject and grouping in lore. Please Ignore this, I will resend. On 11/05/2023 13:58, Ryan Roberts wrote: > Date: Thu, 11 May 2023 11:38:28 +0100 > Subject: [PATCH v1 0/5] Encapsulate PTE contents from non-arch code > > Hi All, > > This series improves the encapsulation of pte entries by disallowing non-arch > code from directly dereferencing pte_t pointers. Instead code must use a new > helper, `pte_t ptep_deref(pte_t *ptep)`. By default, this helper does a direct > dereference of the pointer, so generated code should be exactly the same. But > it's presence sets us up for arch code being able to override the default to > "virtualize" the ptes without needing to maintain a shadow table. > > I intend to take advantage of this for arm64 to enable use of its "contiguous > bit" to coalesce multiple ptes into a single tlb entry, reducing pressure and > improving performance. I have an RFC for the first part of this work at [1]. The > cover letter there also explains the second part, which this series is enabling. > > I intend to post an RFC for the contpte changes in due course, but it would be > good to get the ball rolling on this enabler. > > There are 2 reasons that I need the encapsulation: > > - Prevent leaking the arch-private PTE_CONT bit to the core code. If the core > code reads a pte that contains this bit, it could end up calling > set_pte_at() with the bit set which would confuse the implementation. So we > can always clear PTE_CONT in ptep_deref() (and ptep_get()) to avoid a leaky > abstraction. > - Contiguous ptes have a single access and dirty bit for the contiguous range. > So we need to "mix-in" those bits when the core is dereferencing a pte that > lies in the contig range. There is code that dereferences the pte then takes > different actions based on access/dirty (see e.g. write_protect_page()). > > While ptep_get() and ptep_get_lockless() already exist, both of them are > implemented using READ_ONCE() by default. While we could use ptep_get() instead > of the new ptep_deref(), I didn't want to risk performance regression. > Alternatively, all call sites that currently use ptep_get() that need the > lockless behaviour could be upgraded to ptep_get_lockless() and ptep_get() could > be downgraded to a simple dereference. That would be cleanest, but is a much > bigger (and likely error prone) change because all the arch code would need to > be updated for the new definitions of ptep_get(). > > The series is split up as follows: > > patchs 1-2: Fix bugs where code was _setting_ ptes directly, rather than using > set_pte_at() and friends. > patch 3: Fix highmem unmapping issue I spotted while doing the work. > patch 4: Introduce the new ptep_deref() helper with default implementation. > patch 5: Convert all direct dereferences to use ptep_deref(). > > [1] https://lore.kernel.org/linux-mm/20230414130303.2345383-1-ryan.roberts@xxxxxxx/ > > Thanks, > Ryan > > > Ryan Roberts (5): > mm: vmalloc must set pte via arch code > mm: damon must atomically clear young on ptes and pmds > mm: Fix failure to unmap pte on highmem systems > mm: Add new ptep_deref() helper to fully encapsulate pte_t > mm: ptep_deref() conversion > > .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- > drivers/misc/sgi-gru/grufault.c | 2 +- > drivers/vfio/vfio_iommu_type1.c | 7 +- > drivers/xen/privcmd.c | 2 +- > fs/proc/task_mmu.c | 33 +++--- > fs/userfaultfd.c | 6 +- > include/linux/hugetlb.h | 2 +- > include/linux/mm_inline.h | 2 +- > include/linux/pgtable.h | 13 ++- > kernel/events/uprobes.c | 2 +- > mm/damon/ops-common.c | 18 ++- > mm/damon/ops-common.h | 4 +- > mm/damon/paddr.c | 6 +- > mm/damon/vaddr.c | 14 ++- > mm/filemap.c | 2 +- > mm/gup.c | 21 ++-- > mm/highmem.c | 12 +- > mm/hmm.c | 2 +- > mm/huge_memory.c | 4 +- > mm/hugetlb.c | 2 +- > mm/hugetlb_vmemmap.c | 6 +- > mm/kasan/init.c | 9 +- > mm/kasan/shadow.c | 10 +- > mm/khugepaged.c | 24 ++-- > mm/ksm.c | 22 ++-- > mm/madvise.c | 6 +- > mm/mapping_dirty_helpers.c | 4 +- > mm/memcontrol.c | 4 +- > mm/memory-failure.c | 6 +- > mm/memory.c | 103 +++++++++--------- > mm/mempolicy.c | 6 +- > mm/migrate.c | 14 ++- > mm/migrate_device.c | 14 ++- > mm/mincore.c | 2 +- > mm/mlock.c | 6 +- > mm/mprotect.c | 8 +- > mm/mremap.c | 2 +- > mm/page_table_check.c | 4 +- > mm/page_vma_mapped.c | 26 +++-- > mm/pgtable-generic.c | 2 +- > mm/rmap.c | 32 +++--- > mm/sparse-vmemmap.c | 8 +- > mm/swap_state.c | 4 +- > mm/swapfile.c | 16 +-- > mm/userfaultfd.c | 4 +- > mm/vmalloc.c | 11 +- > mm/vmscan.c | 14 ++- > virt/kvm/kvm_main.c | 9 +- > 48 files changed, 302 insertions(+), 236 deletions(-) > > -- > 2.25.1 >