On 15/11/2023 21:37, Andrew Morton wrote: > On Wed, 15 Nov 2023 16:30:05 +0000 Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > >> However, the primary motivation for this change is to reduce the number >> of tlb maintenance operations that the arm64 backend has to perform >> during fork > > Do you have a feeling for how much performance improved due to this? The commit log for patch 13 (the one which implements ptep_set_wrprotects() for armt64) has performance numbers for a fork() microbenchmark with/without the optimization: ---8<--- I see huge performance regression when PTE_CONT support was added, then the regression is mostly fixed with the addition of this change. The following shows regression relative to before PTE_CONT was enabled (bigger negative value is bigger regression): | cpus | before opt | after opt | |-------:|-------------:|------------:| | 1 | -10.4% | -5.2% | | 8 | -15.4% | -3.5% | | 16 | -38.7% | -3.7% | | 24 | -57.0% | -4.4% | | 32 | -65.8% | -5.4% | ---8<--- Note that's running on Ampere Altra, where TLBI tends to have high cost. > > Are there other architectures which might similarly benefit? By > implementing ptep_set_wrprotects(), it appears. If so, what sort of > gains might they see? The rationale for this is to reduce expense for arm64 to manage contpte-mappings. If other architectures support contpte-mappings then they could benefit from this API for the same reasons that arm64 benefits. I have a vague understanding that riscv has a similar concept to the arm64's contiguous bit, so perhaps they are a future candidate. But I'm not familiar with the details of the riscv feature so couldn't say whether they would be likely to see the same level of perf improvement as arm64. Thanks, Ryan