On Tue, Jan 28, 2025 at 02:23:31PM -0800, Joshua Hahn wrote: > On machines with multiple memory nodes, interleaving page allocations > across nodes allows for better utilization of each node's bandwidth. > Previous work by Gregory Price [1] introduced weighted interleave, which > allowed for pages to be allocated across nodes according to user-set ratios. > > Ideally, these weights should be proportional to their bandwidth, so > that under bandwidth pressure, each node uses its maximal efficient > bandwidth and prevents latency from increasing exponentially. > > At the same time, we want these weights to be as small as possible. > Having ratios that involve large co-prime numbers like 7639:1345:7 leads > to awkward and inefficient allocations, since the node with weight 7 > will remain mostly unused (and despite being proportional to bandwidth, > will not aid in relieving the bandwidth pressure in the other two nodes). > > This patch introduces an auto-configuration mode for the interleave > weights that aims to balance the two goals of setting node weights to be > proportional to their bandwidths and keeping the weight values low. > In order to perform the weight re-scaling, we use an internal > "weightiness" value (fixed to 32) that defines interleave aggression. > > In this auto configuration mode, node weights are dynamically updated > every time there is a hotplug event that introduces new bandwidth. > > Users can also enter manual mode by writing "N" or "0" to the new "auto" > sysfs interface. When a user enters manual mode, the system stops > dynamically updating any of the node weights, even during hotplug events > that can shift the optimal weight distribution. The system also enters > manual mode any time a user sets a node's weight directly by using the > nodeN interface introduced in [1]. On the other hand, auto mode is > only entered by explicitly writing "Y" or "1" to the auto interface. > > There is one functional change that this patch makes to the existing > weighted_interleave ABI: previously, writing 0 directly to a nodeN > interface was said to reset the weight to the system default. Before > this patch, the default for all weights were 1, which meant that writing > 0 and 1 were functionally equivalent. > > This patch introduces "real" defaults, but moves away from letting users > use 0 as a "set to default" interface. Rather, users who want to use > system defaults should use auto mode. This patch seems to be the > appropriate place to make this change, since we would like to remove > this usage before users begin to rely on the feature in userspace. > Moreover, users will not be losing any functionality; they can still > write 1 into a node if they want a weight of 1. Thus, we deprecate the > "write zero to reset" feature in favor of returning an error, the same > way we would return an error when the user writes any other invalid > weight to the interface. > > [1] https://lore.kernel.org/linux-mm/20240202170238.90004-1-gregory.price@xxxxxxxxxxxx/ > > Signed-off-by: Joshua Hahn <joshua.hahnjy@xxxxxxxxx> > Co-developed-by: Gregory Price <gourry@xxxxxxxxxx> > Signed-off-by: Gregory Price <gourry@xxxxxxxxxx> > --- Hi Joshua, I'm glad we're close to finalizing the interface. I believe the author has successfully addressed major concerns through the revisions. The interface and the code now look good to me. Reviewed-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> With a few nits: > diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave b/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave > index 0b7972de04e9..c26879f59d5d 100644 > --- a/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave > +++ b/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave > @@ -20,6 +20,34 @@ Description: Weight configuration interface for nodeN [...snip...] > +What: /sys/kernel/mm/mempolicy/weighted_interleave/auto > +Date: January 2025 > +Contact: Linux memory management mailing list <linux-mm@xxxxxxxxx> > +Description: Auto-weighting configuration interface > + > + Configuration mode for weighted interleave. A 'Y' indicates > + that the system is in auto mode, and a 'N' indicates that > + the system is in manual mode. All other values are invalid. > + > + In auto mode, all node weights are re-calculated and overwritten > + (visible via the nodeN interfaces) whenever new bandwidth data > + is made available during either boot or hotplug events. > + > + In manual mode, node weights can only be updated by the user. > + If a node is hotplugged while the user is in manual mode, > + the node will have a default weight of 1. > + > + Modes can be changed by writing Y, N, 1, or 0 to the interface. > + All other strings will be ignored, and -EINVAL will be returned. > + If Y or 1 is written to the interface but the recalculation or > + updates fail at any point (-ENOMEM or -ENODEV), then the mode > + will remain in manual mode. nit: the commit log describes that writing 'N' or '0' means switching to manual mode and writing 1 means switching to auto mode, but the Documentation does not explicitly states what '0' and '1' does? > + Writing a new weight to a node directly via the nodeN interface > + will also automatically update the system to manual mode. > diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c > index 80a3481c0470..cc94cba112dd 100644 > --- a/drivers/acpi/numa/hmat.c > +++ b/drivers/acpi/numa/hmat.c > @@ -20,6 +20,7 @@ > #include <linux/list_sort.h> > #include <linux/memregion.h> > #include <linux/memory.h> > +#include <linux/mempolicy.h> nit: is this #include directive necessary? -- Harry