On Wed, Mar 21, 2018 at 01:44:25PM -0400, Daniel Jordan wrote: > On 03/20/2018 04:54 AM, Aaron Lu wrote: > ...snip... > > reduced zone->lock contention on free path from 35% to 1.1%. Also, it > > shows good result on parallel free(*) workload by reducing zone->lock > > contention from 90% to almost zero(lru lock increased from almost 0 to > > 90% though). > > Hi Aaron, I'm looking through your series now. Just wanted to mention that I'm seeing the same interaction between zone->lock and lru_lock in my own testing. IOW, it's not enough to fix just one or the other: both need attention to get good performance on a big system, at least in this microbenchmark we've both been using. Agree. > > There's anti-scaling at high core counts where overall system page faults per second actually decrease with more CPUs added to the test. This happens when either zone->lock or lru_lock contention are completely removed, but the anti-scaling goes away when both locks are fixed. > > Anyway, I'll post some actual data on this stuff soon. Looking forward to that, thanks. In the meantime, I'll also try your lru_lock optimization work on top of this patchset to see if the lock contention shifts back to zone->lock.