On Wed, Nov 13, 2024 at 02:28:16PM +0000, Lorenzo Stoakes wrote: > On Tue, Nov 12, 2024 at 11:46:32AM -0800, Suren Baghdasaryan wrote: > > Back when per-vma locks were introduces, vm_lock was moved out of > > vm_area_struct in [1] because of the performance regression caused by > > false cacheline sharing. Recent investigation [2] revealed that the > > regressions is limited to a rather old Broadwell microarchitecture and > > even there it can be mitigated by disabling adjacent cacheline > > prefetching, see [3]. > > I don't see a motivating reason as to why we want to do this? We increase > memory usage here which is not good, but later lock optimisation mitigates > it, but why wouldn't we just do the lock optimisations and use less memory > overall? > Where would you put the lock in that case though? With the patchset it sticks with the affected vma, so no false-sharing woes concerning other instances of the same struct. If you make them separately allocated and packed, they false-share between different vmas using them (in fact this is currently happening). If you make sure to pad them, that's 64 bytes per obj, majority of which is empty space.