On Fri, Aug 9, 2024 at 9:56 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Fri, Aug 9, 2024 at 3:09 PM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > > > On 8/9/24 05:57, Suren Baghdasaryan wrote: > > > Maybe it has something to do with NUMA? The system I'm running has 2 NUMA nodes: > > > > I kinda doubt the NUMA aspect. Whether you allocate a vma that embeds a > > lock, or a vma and immediately the separate lock, it's unlikely they would > > end up on different nodes so from the NUMA perspective I don't see a > > difference. And if they ended up on separate nodes, it would more likely be > > worse for the case of separate locks, not better. > > I have an UMA machine. Will try the test there as well. It won't > provide hard proof but at least some possible hints. Ok, disabling adjacent cacheline prefetching seems to do the trick (or at least cuts down the regression drastically): Hmean faults/cpu-1 470577.6434 ( 0.00%) 470745.2649 * 0.04%* Hmean faults/cpu-4 445862.9701 ( 0.00%) 445572.2252 * -0.07%* Hmean faults/cpu-7 422516.4002 ( 0.00%) 422677.5591 * 0.04%* Hmean faults/cpu-12 344483.7047 ( 0.00%) 330476.7911 * -4.07%* Hmean faults/cpu-21 192836.0188 ( 0.00%) 195266.8071 * 1.26%* Hmean faults/cpu-30 140745.9472 ( 0.00%) 140655.0459 * -0.06%* Hmean faults/cpu-48 110507.4310 ( 0.00%) 103802.1839 * -6.07%* Hmean faults/cpu-56 93507.7919 ( 0.00%) 95105.1875 * 1.71%* Hmean faults/sec-1 470232.3887 ( 0.00%) 470404.6525 * 0.04%* Hmean faults/sec-4 1757368.9266 ( 0.00%) 1752852.8697 * -0.26%* Hmean faults/sec-7 2909554.8150 ( 0.00%) 2915885.8739 * 0.22%* Hmean faults/sec-12 4033840.8719 ( 0.00%) 3845165.3277 * -4.68%* Hmean faults/sec-21 3845857.7079 ( 0.00%) 3890316.8799 * 1.16%* Hmean faults/sec-30 3838607.4530 ( 0.00%) 3838861.8142 * 0.01%* Hmean faults/sec-48 4882118.9701 ( 0.00%) 4608985.0530 * -5.59%* Hmean faults/sec-56 4933535.7567 ( 0.00%) 5004208.3329 * 1.43%* Now, how do we disable prefetching extra cachelines for vm_area_structs only?