On Wed, Feb 05, 2025 at 11:54:05AM +0530, Bharata B Rao wrote: > On 31-Jan-25 6:39 PM, Jonathan Cameron wrote: > > On Fri, 31 Jan 2025 12:28:03 +0000 > > Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote: > > > >>> Here is the list of potential discussion points: > >> ... > >> > >>> 2. Possibility of maintaining single source of truth for page hotness that would > >>> maintain hot page information from multiple sources and let other sub-systems > >>> use that info. > >> Hi, > >> > >> I was thinking of proposing a separate topic on a single source of hotness, > >> but this question covers it so I'll add some thoughts here instead. > >> I think we are very early, but sharing some experience and thoughts in a > >> session may be useful. > > > > Thinking more on this over lunch, I think it is worth calling this out as a > > potential session topic in it's own right rather than trying to find > > time within other sessions. Hence the title change. > > > > I think a session would start with a brief listing of the temperature sources > > we have and those on the horizon to motivate what we are unifying, then > > discussion to focus on need for such a unification + requirements > > (maybe with a straw man). > > Here is a compilation of available temperature sources and how the > hot/access data is consumed by different subsystems: This is super useful, thanks for collecting this. > PA-Physical address available > VA-Virtual address available > AA-Access time available > NA-accessing Node info available > > I have left the slot blank for those which I am not sure about. > ================================================== > Temperature PA VA AA NA > source > ================================================== > PROT_NONE faults Y Y Y Y > -------------------------------------------------- > folio_mark_accessed() Y Y Y > -------------------------------------------------- For fma(), the VA info is available in unmap, but usually it isn't - or doesn't meaningfully exist, as in the case of unmapped buffered IO. I'd say it's an N. > PTE A bit Y Y N N > -------------------------------------------------- > Platform hints Y Y Y Y > (AMD IBS) > -------------------------------------------------- > Device hints Y > (CXL HMU) > ================================================== For the following table, it might be useful to add *when* the source produces this information. Sampling frequency is a likely challenge: consumers have different requirements, and overhead should be limited to the minimum required to serve enabled consumers. Here is an (incomplete) attempt - sorry about the long lines: > And here is an attempt to compile how different subsystems > use the above data: > ============================================================== > Source Subsystem Consumption Activation/Frequency > ============================================================== > PROT_NONE faults NUMAB NUMAB=1 locality based While task is running, > via process pgtable balancing rate varies on observed > walk NUMAB=2 hot page locality and sysctl knobs. > promotion > ============================================================== > folio_mark_accessed() FS/filemap/GUP LRU list activation On cache access and unmap > ============================================================== > PTE A bit via Reclaim:LRU LRU list activation, During memory pressure > rmap walk deactivation/demotion > ============================================================== > PTE A bit via Reclaim:MGLRU LRU list activation, - During memory pressure > rmap walk and process deactivation/demotion - Continuous sampling (configurable) > pgtable walk for workingset reporting > ============================================================== > PTE A bit via DAMON LRU activation, Continuous sampling (configurable)? > rmap walk hot page promotion, (I believe SJ is looking into > demotion etc auto-tuning this). > ============================================================== > Platform hints NUMAB NUMAB=1 Locality based > (AMD IBS) balancing and > NUMAB=2 hot page > promotion > ============================================================== > Device hints NUMAB NUMAB=2 hot page > promotion > ============================================================== > The last two are listed as possibilities. > > Feel free to correct/clarify and add more. > > Regards, > Bharata.