On Fri, 7 Feb 2025 18:20:09 -0800 Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > On Fri, 7 Feb 2025 12:13:35 -0800 Joshua Hahn <joshua.hahnjy@xxxxxxxxx> wrote: > > > This patch introduces an auto-configuration mode for the interleave > > weights that aims to balance the two goals of setting node weights to be > > proportional to their bandwidths and keeping the weight values low. > > In order to perform the weight re-scaling, we use an internal > > "weightiness" value (fixed to 32) that defines interleave aggression. > > Question please. How does one determine whether a particular > configuration is working well? To determine whether > manual-configuration-A is better than manual-configuration-B is better > than auto-configuration? > > Leading to... how do we know that this patch makes the kernel better? Hello Andrew, Thank you for your interest in this patch! To answer your 1st question: I think that users can do some experimentation with the specific workloads they expect to be running with. In particular, since the weights that provide the best results are workload-specific, it might make sense to compare the results across a variety of workloads that the users might be expecting and comparing what settings provide the least amount of throttling. With that said, this patch introduces defaults that will hopefully help those who are either unable or uninterested in setting weights themselves. For users who already have already been using weighted interleave and know what specific weights they should use, the auto settings might not give as much impact as someone who is unsure what the best weights are (and would rather defer the decision-making to the system). As for measuring the accuracy of the default weights generated: The auto mode works by taking nodes' bandwidth data and trying to use small numbers (between 1 and 255) to approximate those bandwidth values. For instance, [19000, 4000, 7000] might be converted to something like [4:1:2], since of course we don't want to be allocating from the second node only after 19000 pages have already been allocated from the first. But simultaneously... 4:1:2 is not the same ratio as 19000:4000:7000. So there is a tradeoff between trying to get accurate weight values, while keeping them small as to not have unbalanced distributions. This is where we chose the value of 32 to be the magic "weightiness" value. Gregory and I spent quite some time modeling this behavior, trying different reduction algorithms and weightiness to see what could give us the most accurate bandwidth data while using the most reasonably small numbers possible, and ended up with 32. (Earlier versions of this patch also exposed the weightiness parameter as a sysfs knob, but it was removed for simplicity's sake.) We've gotten some nice results (under reasonable conditions) after running exhaustive tests for a wide array of bandwidth configurations, which is why we were confident with selecting 32 as the default value. As for the 2nd question and how this patch makes the kernel better : -) Like I mentioned above, this patch might not have a large impact to those already using weighted interleave to see performance gains and know what weights work the best. However, we believe there are users out there who (1) have nodes with varying bandwidths (CXL), (2) have workloads that are bandwidth-bound, and (3) would like to take advantage of weighted interleave but do not have the capacity or are not willing to manually change the weights. For these folks, having defaults that make sense (as opposed to the previous defaults in weighted interleave, which would make it functionally the same as unweighted interleave) can provide more options and performance gains to those who wish to opt-in. I apologize for the long explanation, but I hope that this answers your question. Please let me know if there is anything else I can do! Thank you again for your interest. I hope you have a great day! Joshua