On 6/22/23 07:42, Ryan Roberts wrote:
With the ptep API sufficiently refactored, we can now introduce a new "contpte" API layer, which transparently manages the PTE_CONT bit for user mappings. Whenever it detects a set of PTEs that meet the requirements for a contiguous range, the PTEs are re-painted with the PTE_CONT bit. This initial change provides a baseline that can be optimized in future commits. That said, fold/unfold operations (which imply tlb invalidation) are avoided where possible with a few tricks for access/dirty bit management. Write-enable and write-protect modifications are likely non-optimal and likely incure a regression in fork() performance. This will be addressed separately. Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> ---
Hi Ryan! While trying out the full series from your gitlab features/granule_perf/all branch, I found it necessary to EXPORT a symbol in order to build this. Please see below: ...
+ +pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte) +{ + /* + * Gather access/dirty bits, which may be populated in any of the ptes + * of the contig range. We are guarranteed to be holding the PTL, so any + * contiguous range cannot be unfolded or otherwise modified under our + * feet. + */ + + pte_t pte; + int i; + + ptep = contpte_align_down(ptep); + + for (i = 0; i < CONT_PTES; i++, ptep++) { + pte = __ptep_get(ptep); + + /* + * Deal with the partial contpte_ptep_get_and_clear_full() case, + * where some of the ptes in the range may be cleared but others + * are still to do. See contpte_ptep_get_and_clear_full(). + */ + if (pte_val(pte) == 0) + continue; + + if (pte_dirty(pte)) + orig_pte = pte_mkdirty(orig_pte); + + if (pte_young(pte)) + orig_pte = pte_mkyoung(orig_pte); + } + + return orig_pte; +}
Here we need something like this, in order to get it to build in all possible configurations: EXPORT_SYMBOL_GPL(contpte_ptep_get); (and a corresponding "#include linux/export.h" at the top of the file). Because, the static inline functions invoke this routine, above. thanks, -- John Hubbard NVIDIA