On Mon, Jan 9, 2023 at 9:55 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > vma->lock being part of the vm_area_struct causes performance regression > during page faults because during contention its count and owner fields > are constantly updated and having other parts of vm_area_struct used > during page fault handling next to them causes constant cache line > bouncing. Fix that by moving the lock outside of the vm_area_struct. > All attempts to keep vma->lock inside vm_area_struct in a separate > cache line still produce performance regression especially on NUMA > machines. Smallest regression was achieved when lock is placed in the > fourth cache line but that bloats vm_area_struct to 256 bytes. Just checking: When you tested putting the lock in different cache lines, did you force the slab allocator to actually store the vm_area_struct with cacheline alignment (by setting SLAB_HWCACHE_ALIGN on the slab or with a ____cacheline_aligned_in_smp on the struct definition)?