On 30/06/2023 02:54, John Hubbard wrote: > On 6/22/23 07:42, Ryan Roberts wrote: >> With the ptep API sufficiently refactored, we can now introduce a new >> "contpte" API layer, which transparently manages the PTE_CONT bit for >> user mappings. Whenever it detects a set of PTEs that meet the >> requirements for a contiguous range, the PTEs are re-painted with the >> PTE_CONT bit. >> >> This initial change provides a baseline that can be optimized in future >> commits. That said, fold/unfold operations (which imply tlb >> invalidation) are avoided where possible with a few tricks for >> access/dirty bit management. >> >> Write-enable and write-protect modifications are likely non-optimal and >> likely incure a regression in fork() performance. This will be addressed >> separately. >> >> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> >> --- > > Hi Ryan! > > While trying out the full series from your gitlab features/granule_perf/all > branch, I found it necessary to EXPORT a symbol in order to build this. Thanks for the bug report! > Please see below: > > ... >> + >> +pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte) >> +{ >> + /* >> + * Gather access/dirty bits, which may be populated in any of the ptes >> + * of the contig range. We are guarranteed to be holding the PTL, so any >> + * contiguous range cannot be unfolded or otherwise modified under our >> + * feet. >> + */ >> + >> + pte_t pte; >> + int i; >> + >> + ptep = contpte_align_down(ptep); >> + >> + for (i = 0; i < CONT_PTES; i++, ptep++) { >> + pte = __ptep_get(ptep); >> + >> + /* >> + * Deal with the partial contpte_ptep_get_and_clear_full() case, >> + * where some of the ptes in the range may be cleared but others >> + * are still to do. See contpte_ptep_get_and_clear_full(). >> + */ >> + if (pte_val(pte) == 0) >> + continue; >> + >> + if (pte_dirty(pte)) >> + orig_pte = pte_mkdirty(orig_pte); >> + >> + if (pte_young(pte)) >> + orig_pte = pte_mkyoung(orig_pte); >> + } >> + >> + return orig_pte; >> +} > > Here we need something like this, in order to get it to build in all > possible configurations: > > EXPORT_SYMBOL_GPL(contpte_ptep_get); > > (and a corresponding "#include linux/export.h" at the top of the file). > > Because, the static inline functions invoke this routine, above. A quick grep through the drivers directory shows: ptep_get() is used by: - drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c - drivers/misc/sgi-gru/grufault.c - drivers/vfio/vfio_iommu_type1.c - drivers/xen/privcmd.c ptep_set_at() is used by: - drivers/gpu/drm/i915/i915_mm.c - drivers/xen/xlate_mmu.c None of the other symbols are called, but I guess it is possible that out of tree modules are calling others. So on the basis that these symbols were previously pure inline, I propose to export all the contpte_* symbols using EXPORT_SYMBOL() so that anything that was previously calling them successfully continue to do so. Will include in v2. Thanks, Ryan > > thanks,