From: SeongJae Park <sjpark@xxxxxxxxx> Hello Fernand, Thank you for the questions! On Tue, 25 May 2021 17:17:05 +0200 <sieberf@xxxxxxxxxx> wrote: > Hi SeongJae, > > The code looks good. Some questions for this patch: > > The region merge threshold is computed on the access diff. Should the > diff threshold be exponential as diffs in low number of access are > likely to be more important? I.e if the threshold is 5, a region A with > 0 accesses will be merged with a region B with 4 accesses (diff=4), but > a region C with 50 access won't be merged with a region D with 60 > accesses (diff=10), however it seems to me that keeping a good > granularity between A and B is more important than between C and D for > FPR. What do you think? That totally makes sense if we have interest in only cold pages. However, DAMON is for more general use cases. In some cases, people would have interest in hot pages. Using exponential diff might make the regions merging more aggressive, and result in smaller overhead. But, I think the amount of the problem and benefit is unclear for now. I was unable to find the overhead becomes problematically high in my tests with production systems. I think we could add another option for this later, after we find it becomes a real problem. > > When the number of regions is less than half max region, region split > kicks in and doubles the number of region. This means that the number of > region will grow close to max region, then slowly decay as region > merges, until it reaches half max regions, then double again. This seems > to create a non-uniform region number distribution over time, with large > cycles. Also we do a lot of work when we double and no work otherwise. > Not sure what's the impact on measurement quality but intuitively seems > like keeping the number of regions constant over time would yield more > consistent metrics? How about we rather always split regions at each > iteration, and for each region we give a split probability? Agreed, I think this makes sense. I also planning to make the probability adaptively changes based on current monitoring result, in future. Nevertheless, I want to keep the logic as simple as possible for now, unless we see clear problem and benefit there. Thanks, SeongJae Park > > Kind regards, > > --Fernand