On Mon, Feb 24, 2025 at 04:05:43PM +0000, Mark Rutland wrote: [...] > > > > Just a minor concern. I don't see any handling for task migration. > > > > E.g., for a task is migrated from one CPU to another CPU, I expect we > > > > need to save and restore branch records based on BRBE injection. So > > > > far, the driver simply invalidates all records. > > > > > > > > I think this topic is very likely discussed before. If this is the > > > > case, please ignore my comment. Except this, the code looks good > > > > to me. > > > > > > Not really discussed on the list, but that was present in v18 (though > > > not functional because .sched_task() hook wasn't actually enabled) and > > > Mark removed it. His work is here[1].The only comment was: > > > > > > Note: saving/restoring at context-switch doesn't interact well with > > > event rotation (e.g. if filters change) > > > > In the brbe_enable() function, it "Merge the permitted branch filters > > of all events". Based on current implementation, all events share the > > same branch filter. > > Critically, the brbe_enable() function merges the filters of all > *active* events which have been installed into hardware. It does not > track all events which can be rotated, and the resulting filter is not > the same -- it can change as a result of rotation. In a perf session has multiple events, and events have different branch filters, seems to me, a simple way is to return error for this case. I would argue BRBE is an IP for recording branches per CPU wise, it does not support recording for event wise. If we can unify branch filter within a perf session, would this be much easier for handling? > > When event rotation happens, if without context switch, in theory we > > should can directly use the branch record (no invalidation, no injection) > > for all events. > > No; that only works in *some* cases, and will produce incorrect results > in others. > > For example, consider filtering. Imagine a PMU with a single counter, > and two events, where event-A filters for calls-and-returns and event-B > filters for calls-only. When switching from event-A to event-B, it's > theoretically possible to keep the existing records around, knowing that > the returns can be filtered out later. When switching from event-B to > event-A we cannot keep the existing records, since there are gaps > whenever a return should have been recorded. Seems to me, the problem is not caused by event rotation. We need to calculate a correct filter in the first place - the BRBE driver should calculate a superset for all filters of events for a session. Then, generate branch record based event's specific filter. > There are a number of cases of that shape given the set of configurable > filters. In theory it's possible to retain those in some cases, but I > don't think that the complexity is justified. > > Similarly, whenever kernel branches are recorded it's necessary to drop > the stale branches whenever branch recording is paused, as there's > necessarily a blackout period and hence a gap in the records. If we save BRBE record when a process is switched out and then restore the record when a process is switched in, should we can keep a decent branch record for performance profiling? I understand it might be many things happen in the middle of a task switching or migration, but it is fine for not recording branches during the blackout period. The missed branch records are not very helpful for forming a flow for a profiled program itself, especially, if we consider we will optimize userspace program in many cases. > Do you think that you have a case where losing branches across rotation > *really* matters? I agreed that event rotation case might be rare and complex. Please see a comment below. > > For a context-switch case, we need to save and re-inject branch record. > > BRBE record sticks to a process context, no matter what events have been > > enabled. > > I had originally wanted to keep per-event records around, but it doesn't > work in all cases. One reason events get discarded at context-switch > time is that CPU-bound events can sample branches, and would > mis-attribute stale userspace branches to the wrong context when > switching tasks. There are explicit comments about this in > amd_pmu_brs_sched_task() and intel_pmu_lbr_sched_task(). > > Given we discard records when reprogramming events, we *could* try to > preserve events in some cases, but I suspect that as with the rotation > case this'll be a lot of complexity for little gain. Note that as we > discard events when enabling the PMU, we'd throw some task-bound records > away anyway, and practically the gain would be limited to cpu-bound > records. > > Do you have a reason why you think we *must* keep events around? Here I am really concerned are cases when a process is preempted or migrated. The driver doesn't save and restore branch records for these cases, it just invalidates all records when a task is scheduled in. As a result, if an event overflow is close to context switch, it is likely to capture incomplete branch records. For a userspace-only tracing, it is risk to capture empty branch record after preemption and migrations. Thanks, Leo