On Wed, Dec 11, 2024 at 04:01:40PM +0000, Mikołaj Lenczewski wrote: > When converting a region via contpte_convert() to use mTHP, we have two > different goals. We have to mark each entry as contiguous, and we would > like to smear the dirty and young (access) bits across all entries in > the contiguous block. Currently, we do this by first accumulating the > dirty and young bits in the block, using an atomic > __ptep_get_and_clear() and the relevant pte_{dirty,young}() calls, > performing a tlbi, and finally smearing the correct bits across the > block using __set_ptes(). > > This approach works fine for BBM level 0, but with support for BBM level > 2 we are allowed to reorder the tlbi to after setting the pagetable > entries. This reordering means that other threads will not see an > invalid pagetable entry, instead operating on stale data, until we have > performed our smearing and issued the invalidation. Avoiding this > invalid entry reduces faults in other threads, and thus improves > performance marginally (more so when there are more threads). Please provide the performance data. Will