Forgot Cc-ing linux-mm@ and linux-kernel@. Forwarding. Sorry for noise. Thanks, SJ === >8 === From: SeongJae Park <sj@xxxxxxxxxx> To: damon@xxxxxxxxxxxxxxx CC: SeongJae Park <sj@xxxxxxxxxx>, kernel-team@xxxxxxxx Subject: Two simple ideas for DAMON accuracy improvement Message-Id: <20241026215311.148363-1-sj@xxxxxxxxxx> Date: Sat, 26 Oct 2024 14:53:11 -0700 Local-Date: 2024-10-26 14:53:11-07:00 Hello DAMON community, There were a number of grateful questions, concerns, and improvement ideas around monitoring output accuracy of DAMON. I always admitted the fact that DAMON has many rooms for improvement, but was bit awary at changes for some reasons. Now I think it caused some unnecessarily long delay. Sorry about that. Now I want to invest some time on the topic. So starting by sharing below two simple ideas first. User-defined Regions Split Factor --------------------------------- DAMON's "Adasptive Regions Adjustment (ARA)" mechanism splits each region into randomly sized sub regions, show their access temperature, and merge back adjacent regions having similar temperature. The split factor is hard-coded as two. Increasing the number make DAMON regions more quickly converges in right shape. However, it makes number of DAMON regions in usual situation higher, and therefore induce more overhead. It will still keep the user-defined upper limit (max_nr_regions), though. The optimum value of the split factor would depend on the use case. We will therefore add another knob to let users set the factor on runtime. The default value will be two, so this will not introduce any regression or behavioral change to existing users. Periodic Fine-grain Split of Aged Regions ----------------------------------------- If a region is continuously changing its boundary and access temperature, it means it is converging, or the access pattern of the workload is not stabilized. Either case, this is a healthy signal. If a region is consistently showing same access pattern for long time, it may because the access pattern is stabilized, and the region is correctly converged. However, it might be because the access pattern is changed, but the converging is slow. To avoid the too slow converging of aged regions, we will let users periodically increase the split factor for regions that kept current access pattern for long time (high 'age'). Users will be able to set the 'age' offset, the split factor for the aged regions, and time interval between the periodic fine-grain split of the regions. For example, users can ask DAMON to "split regions keeping current access pattern for ten minutes or higher to five sub-regions every minute". The feature will be ignored unless users explicitly set those, so that it does not introduce any regression of behavioral change to existing users. Discussions ----------- Someone might worry if these are adding too much knobs. As I shared the long term plan on last LPC[1], we will keep supporting those new knobs in long term, and may introduce auto-tuning feature in future. By letting these user-tunable first, we can collect experiment results and use those for the future improvements. Anyway, these changes will not introduce any regresion or behavioral change to existing users based on the idea, so I believe these are safe to be added. One of the factors that made my work on this topic was absence of a formal DAMON accuracy evaluation method. Using damon-tests, we were able to do the evaluation by drawing heatmaps of test workloads and comparing those from different versions of DAMON. Comparing several DAMOS schemes results on test workloads were also one way for that. But, those are not formal. We still don't have a formal way for accuracy evaluation. However, the two features will introduce no regression to existing users, so I believe this is the path forward for now. I believe implementing the features would be not difficult. So unless someone voluntarily steps up, I will start implementation of the features, targeting v6.14 merge window. I'm looking forward to any comments. [1] https://lpc.events/event/18/contributions/1768/ Thanks, SJ