On 2/8/2023 11:33 PM, Peter Zijlstra wrote: > On Wed, Feb 08, 2023 at 01:05:28PM +0530, Bharata B Rao wrote: > >> - Perf uses IBS and we are using the same IBS for access profiling here. >> There needs to be a proper way to make the use mutually exclusive. > > No, IFF this lives it needs to use in-kernel perf. In fact I started out with in-kernel perf by using the perf_event_create_kernel_counter() API. However there are issues with using in-kernel perf: - We want to reprogram the counter potentially during every context switch. The IBS hardware sample counter needs to be reprogrammed based on the incoming thread's view of sample period. Additionally sampling needs to be disabled for kernel threads. So I wanted to use perf_event_enable/disable() and perf_event_period(). However they take mutexes and hence it is not possible to use them from the sched switch atomic context. - In-kernel perf gives a per-cpu counter, but we want it to count based on the task that is currently running. I,e., the period should be modified on per-task basis. I don't see how an in-kernel perf event counter can be associated with per-task like this. Hence I didn't see an easy option other than making the use of IBS in perf and NUMA balancing mutually exclusive. > >> - Is tying this up with NUMA balancing a reasonable approach or >> should we look at a completely new approach? > > Is it giving sufficient win to be worth it, afaict it doesn't come even > close to justifying it. > >> - Hardware provided access information could be very useful for driving >> hot page promotion in tiered memory systems. Need to check if this >> requires different tuning/heuristics apart from what NUMA balancing >> already does. > > I think Huang Ying looked at that from the Intel POV and I think the > conclusion was that it doesn't really work out. What you need is > frequency information, but the PMU doesn't really give you that. You > need to process a *ton* of PMU data in-kernel. What I am doing here is to feed the access data into NUMA balancing which already has the logic to aggregate that at task and numa group level and decide if that access is actionable in terms of migrating the page. In this context, I am not sure about the frequency information that you and Dave are mentioning. AFAIU, existing NUMA balancing takes care of taking action, IBS becomes an alternative source of access information to NUMA hint faults. Thanks for your inputs. Regards, Bharata.