Re: [LSF/MM/BPF TOPIC] Unifying sources of page temperature information - what info is actually wanted?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 05, 2025 at 11:54:05AM +0530, Bharata B Rao wrote:
> On 31-Jan-25 6:39 PM, Jonathan Cameron wrote:
> > On Fri, 31 Jan 2025 12:28:03 +0000
> > Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:
> > 
> >>> Here is the list of potential discussion points:
> >> ...
> >>
> >>> 2. Possibility of maintaining single source of truth for page hotness that would
> >>> maintain hot page information from multiple sources and let other sub-systems
> >>> use that info.
> >> Hi,
> >>
> >> I was thinking of proposing a separate topic on a single source of hotness,
> >> but this question covers it so I'll add some thoughts here instead.
> >> I think we are very early, but sharing some experience and thoughts in a
> >> session may be useful.
> > 
> > Thinking more on this over lunch, I think it is worth calling this out as a
> > potential session topic in it's own right rather than trying to find
> > time within other sessions.  Hence the title change.
> > 
> > I think a session would start with a brief listing of the temperature sources
> > we have and those on the horizon to motivate what we are unifying, then
> > discussion to focus on need for such a unification + requirements
> > (maybe with a straw man).
> 
> Here is a compilation of available temperature sources and how the 
> hot/access data is consumed by different subsystems:

This is super useful, thanks for collecting this.

> PA-Physical address available
> VA-Virtual address available
> AA-Access time available
> NA-accessing Node info available
> 
> I have left the slot blank for those which I am not sure about.
> ==================================================
> Temperature		PA	VA	AA	NA
> source
> ==================================================
> PROT_NONE faults	Y	Y	Y	Y
> --------------------------------------------------
> folio_mark_accessed()	Y		Y	Y
> --------------------------------------------------

For fma(), the VA info is available in unmap, but usually it isn't -
or doesn't meaningfully exist, as in the case of unmapped buffered IO.

I'd say it's an N.

> PTE A bit		Y	Y	N	N
> --------------------------------------------------
> Platform hints		Y	Y	Y	Y
> (AMD IBS)
> --------------------------------------------------
> Device hints		Y
> (CXL HMU)
> ==================================================

For the following table, it might be useful to add *when* the source
produces this information. Sampling frequency is a likely challenge:
consumers have different requirements, and overhead should be limited
to the minimum required to serve enabled consumers.

Here is an (incomplete) attempt - sorry about the long lines:

> And here is an attempt to compile how different subsystems
> use the above data:
> ==============================================================
> Source			Subsystem		Consumption         Activation/Frequency
> ==============================================================
> PROT_NONE faults	NUMAB		NUMAB=1 locality based              While task is running,
> via process pgtable			balancing                           rate varies on observed
> walk					NUMAB=2 hot page                    locality and sysctl knobs.
> 					promotion
> ==============================================================
> folio_mark_accessed()	FS/filemap/GUP	LRU list activation                 On cache access and unmap
> ==============================================================
> PTE A bit via		Reclaim:LRU	LRU list activation,	            During memory pressure
> rmap walk				deactivation/demotion
> ==============================================================
> PTE A bit via		Reclaim:MGLRU	LRU list activation,	            - During memory pressure
> rmap walk and process			deactivation/demotion               - Continuous sampling (configurable)
> pgtable walk                                                                for workingset reporting
> ==============================================================
> PTE A bit via		DAMON		LRU activation,                     Continuous sampling (configurable)?
> rmap walk				hot page promotion,                 (I believe SJ is looking into
> 					demotion etc                         auto-tuning this).
> ==============================================================
> Platform hints		NUMAB		NUMAB=1 Locality based
> (AMD IBS)				balancing and
> 					NUMAB=2 hot page
> 					promotion
> ==============================================================
> Device hints		NUMAB		NUMAB=2 hot page
> 					promotion
> ==============================================================
> The last two are listed as possibilities.
> 
> Feel free to correct/clarify and add more.
> 
> Regards,
> Bharata.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux