On 5/24/21 2:07 AM, Mel Gorman wrote: > On Fri, May 21, 2021 at 03:13:35PM -0700, Dave Hansen wrote: >> On 5/21/21 3:28 AM, Mel Gorman wrote: >>> The PCP high watermark is based on the number of online CPUs so the >>> watermarks must be adjusted during CPU hotplug. At the time of >>> hot-remove, the number of online CPUs is already adjusted but during >>> hot-add, a delta needs to be applied to update PCP to the correct >>> value. After this patch is applied, the high watermarks are adjusted >>> correctly. >>> >>> # grep high: /proc/zoneinfo | tail -1 >>> high: 649 >>> # echo 0 > /sys/devices/system/cpu/cpu4/online >>> # grep high: /proc/zoneinfo | tail -1 >>> high: 664 >>> # echo 1 > /sys/devices/system/cpu/cpu4/online >>> # grep high: /proc/zoneinfo | tail -1 >>> high: 649 >> This is actually a comment more about the previous patch, but it doesn't >> really become apparent until the example above. >> >> In your example, you mentioned increased exit() performance by using >> "vm.percpu_pagelist_fraction to increase the pcp->high value". That's >> presumably because of the increased batching effects and fewer lock >> acquisitions. >> > Yes > >> But, logically, doesn't that mean that, the more CPUs you have in a >> node, the *higher* you want pcp->high to be? If we took this to the >> extreme and had an absurd number of CPUs in a node, we could end up with >> a too-small pcp->high value. >> > I see your point but I don't think increasing pcp->high for larger > numbers of CPUs is the right answer because then reclaim can be > triggered simply because too many PCPs have pages. > > To address your point requires much deeper surgery. ... > There is value to doing something like this but it's beyond what this > series is trying to do and doing the work without introducing regressions > would be very difficult. Agreed, such a solution is outside of the scope of what this set is trying to do. It would be nice to touch on this counter-intuitive property in the changelog, and *maybe* add a WARN_ON_ONCE() if we hit an edge case. Maybe WARN_ON_ONCE() if pcp->high gets below pcp->batch*SOMETHING.