On Mon, Nov 11, 2024 at 2:18 PM Davidlohr Bueso <dave@xxxxxxxxxxxx> wrote: > > On Mon, 11 Nov 2024, Suren Baghdasaryan wrote: > > >To minimize memory overhead, vm_lock implementation is changed from > >using rw_semaphore (40 bytes) to an atomic (8 bytes) and several > >vm_area_struct members are moved into the last cacheline, resulting > >in a less fragmented structure: > > I am not a fan of building a custom lock, replacing a standard one. Understandable. > How much do we really care about this? In the Android world I got complaints after introducing per-vma locks. More specifically, moving from 5.15 to 6.1 kernel, where we first started using these locks, the memory usage increased by 10MB on average. Not a huge deal but if we can trim it without too much complexity, that would be definitely appreciated. > rwsems are quite optimized and are known to heavily affect mm performance altogether. I know, that's why I spent an additional week profiling the new implementation. I asked Oliver (CC'ed) to rerun the tests he used in https://lore.kernel.org/all/ZsQyI%2F087V34JoIt@xsang-OptiPlex-9020/ to confirm no regressions. > > ... > > >Performance measurements using pft test on x86 do not show considerable > >difference, on Pixel 6 running Android it results in 3-5% improvement in > >faults per second. > > pft is a very micro benchmark, these results do not justify this change, imo. I'm not really trying to claim performance gains here. I just want to make sure there are no regressions. Thanks, Suren. > > Thanks, > Davidlohr