DAMON requires time-consuming and repetitive aggregation interval tuning. Introduce a feature for automating it using a feedback loop that aims an amount of observed access events, like auto-exposing cameras. Background: Access Frequency Monitoring and Aggregation Interval ================================================================ DAMON checks if each memory element (damon_region) is accessed or not for every user-specified time interval called 'sampling interval'. It aggregates the check intervals on per-element counter called 'nr_accesses'. DAMON users can read the counters to get the access temperature of a given element. The counters are reset for every another user-specified time interval called 'aggregation interval'. This can be illustrated as DAMON continuously capturing a snapshot of access events that happen and captured within the last aggregation interval. This implies the aggregation interval plays a key role for the quality of the snapshots, like the camera exposure time. If it is too short, the amount of access events that happened and captured for each snapshot is small, so each snapshot will show no many interesting things but just a cold and dark world with hopefuly one pale blue dot or two. If it is too long, too many events are aggregated in a single shot, so each snapshot will look like world of flames, or Muspellheim. It will be difficult to find practical insights in both cases. Problem: Time Consuming and Repetitive Tuning ============================================= The appropriate length of the aggregation interval depends on how frequently the system and workloads are making access events that DAMON can observe. Hence, users have to tune the interval with excessive amount of tests with the target system and workloads. If the system and workloads are changed, the tuning should be done again. If the characteristic of the workloads is dynamic, it becomes more challenging. It is therefore time-consuming and repetitive. The tuning challenge mainly stems from the wrong question. It is not asking users what quality of monitoring results they want, but how DAMON should operate for their hidden goal. To make the right answer, users need to fully understand DAMON's mechanisms and the characteristics of their workloads. Users shouldn't be asked to understand the underlying mechanism. Understanding the characteristics of the workloads shouldn't be the role of users but DAMON. Aim-oriented Feedback-driven Auto-Tuning ========================================= Fortunately, the appropriate length of the aggregation interval can be inferred using a feedback loop. If the current snapshots are showing no much intresting information, in other words, if it shows only rare access events, increasing the aggregation interval helps, and vice versa. We tested this theory on a few real-world workloads, and documented one of the experience with an official DAMON monitoring intervals tuning guideline. Since it is a simple theory that requires repeatable tries, it can be a good job for machines. Based on the guideline's theory, we design an automation of aggregation interval tuning, in a way similar to that of camera auto-exposure feature. It defines the amount of interesting information as the ratio of captured access events to total capturing attempts of single snapshot, or more technically speaking, the ratio of positive access check samples to total samples within the aggregation interval. It allows the users to set the target value of the ratio. Once the target is set, the automation periodically measures the current value of the ratio and increase or decrease the aggregation interval if the ratio value is lower or higher than the target. The amount of the change is proportion to the distance between current value and the target value. To avoid auto-tuning goes too long way, let users set minimum and maximum aggregation interval time. Changing only aggregation interval while sampling interval is kept make the maximum level of access frequency in each snapshot, or discernment of regions inconsistent. Also, unnecessarily short sampling interval causes meaningless monitoring overhed. The automation therefore adjusts the sampling interval together with aggregation interval, while keeping the ratio between the two intervals. Users can set the ratio, or the discernment. Discussion ========== The modified question (aimed amount of heats in each snapshot) is easy to answer by both the users and the kernel. If users are interested in finding more cold regions, the value should be lower, and vice versa. If users have no idea, kernel can suggest about 20% positive access samples ratio as a fair default value based on the Pareto principle. Sampling to aggregation intervals ratio and min/max aggregation intervals are also arguably easy to answer. What users want is discernment of regions for efficient system operation, for examples, X amount of colder regions or Y amount of warmer regions, not exactly how many times each cache line is accessed in nanoseconds degree. The appropriate min/max aggregation interval can relatively naively set, and may better to set for aimed monitoring overhead. Since sampling interval is directly related with the overhead, setting it based on the sampling interval can be easy. With my experiences, I'd argue the intervals ratio 0.05, and 5 milliseconds to 20 seconds sampling interval range (100 milliseconds to 400 seconds aggregation interval) can be a good default suggestions. Evaluation ========== We confirmed the tuning works as expected with only a few simple workloads including kernel builds, and that's why this is an RFC. We will conduct more evaluations with more massive and realistic workloads and share the results by the time that we drop the RFC tag. SeongJae Park (8): mm/damon: add data structure for monitoring intervals auto-tuning mm/damon/core: implement intervals auto-tuning mm/damon/sysfs: implement intervals tuning goal directory mm/damon/sysfs: commit intervals tuning goal mm/damon/sysfs: implement a command to update auto-tuned monitoring intervals Docs/mm/damon/design: document for intervals auto-tuning Docs/ABI/damon: document intervals auto-tuning ABI Docs/admin-guide/mm/damon/usage: add intervals_goal directory on the hierarchy .../ABI/testing/sysfs-kernel-mm-damon | 30 +++ Documentation/admin-guide/mm/damon/usage.rst | 25 ++ Documentation/mm/damon/design.rst | 38 +++ include/linux/damon.h | 43 ++++ mm/damon/core.c | 90 ++++++++ mm/damon/sysfs.c | 216 ++++++++++++++++++ 6 files changed, 442 insertions(+) base-commit: d5c35650f4945e1406871f9d9d51ab8c54ec0d03 -- 2.39.5