Re: [LSF/MM/BPF TOPIC v2] Unifying sources of page temperature information - what info is actually wanted?

SeongJae Park <sj@xxxxxxxxxx> · Fri, 21 Mar 2025 10:36:19 -0700

On Fri, 21 Mar 2025 15:30:44 +0000 Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:

Thank you for your nice comments.  I agree to all your points, and adding just
a few more details below.

[...]
> Whilst I'm not in a position to share the data, as it's not mine :( I've
> seen graphs that show that for at least some use cases, even if we have some
> contiguity of hotness in the VA space, it looks like noise in PA.  So
> I think this is a case of 'mileage may vary'. Damon works great sometimes but
> sometime the spared of access statistics happen to be wrong.

100% agree.  Your findings and conclusions match with mine.  Nevertheless, we
are trying to find why and when it works bad and good, and make it better in
more cases.  So far, we found better visualization methods and DAMON parameters
tuning can help.  We are therefore adding more visualization methods and DAMON
parameters auto-tuning.  Still far from the perfect, but it would continue
being closer to the north star if we, the community, work together.

[...]
> > >  	b) Metadata beyond the counts is useful
> > > 	   https://lore.kernel.org/all/87h64u2xkh.fsf@DESKTOP-5N7EMDA/
> > > 	   Promotion algorithms can need aggregate statistics for a memory 
> > > 	   device to decide how much to move.  
> > 
> > DAMOS quotas goal feature is a sort of a feature for this question.  It allows
> > users to set target metric and value, and tune the aggressiveness.  For
> > promotions and demotions, I suggested using upper tier utilization and free
> > ratio as such possible goal metric, and gonna post an implementation for that
> > soon.
> 
> Those are certainly good metrics to consider, but I think we definitely also
> need a metric around how beneficial are the moves being made.
> 
> That matters more on the promotion path, because that interrupts access to
> hot data and so will cause a temporary drop in performance / latency spike.

Good point, and agreed.  I think we can, and should, continue making such
better metrics together.

And I think DAMOS quota goal is a feature that can be easily used for
prototypes, experiments and hopefully productionizing of such new metric.  The
feature is easy to extend for new metrics, and also supports setting multiple
goals.  Also, it supports users directly feeding arbitrary input to the
feedback loop.

> 
> > 
> > > 
> > > As noted above, this may well overlap with other sessions.
> > > One outcome of the discussion so far is to highlight what I think many
> > > already knew.  This is hard!  
> > 
> > Indeed.  Keeping more people on the same page is important and difficult.
> > Thank you for your effort again, and looking forward to discuss in more depth!
> >
> 
> I'm not sure we'll succeed.  This may well be a wild west situation for a while
> yet, but hopefully we can slowly converge or at least build some common
> parts.

I'm very sure this session will be an important step for the journey :)

> 
> Jonathan
> 
> p.s. Heathrow disruption means I'm crossing my fingers on actually getting to
> Montreal.

I hope it all go well with you!

Thanks,
SJ

[...]