Hi SeongJae,
The code looks good. Some questions for this patch:
The region merge threshold is computed on the access diff. Should the
diff threshold be exponential as diffs in low number of access are
likely to be more important? I.e if the threshold is 5, a region A with
0 accesses will be merged with a region B with 4 accesses (diff=4), but
a region C with 50 access won't be merged with a region D with 60
accesses (diff=10), however it seems to me that keeping a good
granularity between A and B is more important than between C and D for
FPR. What do you think?
When the number of regions is less than half max region, region split
kicks in and doubles the number of region. This means that the number of
region will grow close to max region, then slowly decay as region
merges, until it reaches half max regions, then double again. This seems
to create a non-uniform region number distribution over time, with large
cycles. Also we do a lot of work when we double and no work otherwise.
Not sure what's the impact on measurement quality but intuitively seems
like keeping the number of regions constant over time would yield more
consistent metrics? How about we rather always split regions at each
iteration, and for each region we give a split probability?
Kind regards,
--Fernand