Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 24, 2025 at 04:05:43PM +0000, Mark Rutland wrote:

[...]

> > > > Just a minor concern.  I don't see any handling for task migration.
> > > > E.g., for a task is migrated from one CPU to another CPU, I expect we
> > > > need to save and restore branch records based on BRBE injection.  So
> > > > far, the driver simply invalidates all records.
> > > >
> > > > I think this topic is very likely discussed before.  If this is the
> > > > case, please ignore my comment.  Except this, the code looks good
> > > > to me.
> > > 
> > > Not really discussed on the list, but that was present in v18 (though
> > > not functional because .sched_task() hook wasn't actually enabled) and
> > > Mark removed it. His work is here[1].The only comment was:
> > > 
> > > Note: saving/restoring at context-switch doesn't interact well with
> > > event rotation (e.g. if filters change)
> > 
> > In the brbe_enable() function, it "Merge the permitted branch filters
> > of all events".  Based on current implementation, all events share the
> > same branch filter.
> 
> Critically, the brbe_enable() function merges the filters of all
> *active* events which have been installed into hardware. It does not
> track all events which can be rotated, and the resulting filter is not
> the same -- it can change as a result of rotation.

In a perf session has multiple events, and events have different branch
filters, seems to me, a simple way is to return error for this case.

I would argue BRBE is an IP for recording branches per CPU wise, it does
not support recording for event wise.  If we can unify branch filter
within a perf session, would this be much easier for handling?

> > When event rotation happens, if without context switch, in theory we
> > should can directly use the branch record (no invalidation, no injection)
> > for all events.
> 
> No; that only works in *some* cases, and will produce incorrect results
> in others.
> 
> For example, consider filtering. Imagine a PMU with a single counter,
> and two events, where event-A filters for calls-and-returns and event-B
> filters for calls-only. When switching from event-A to event-B, it's
> theoretically possible to keep the existing records around, knowing that
> the returns can be filtered out later. When switching from event-B to
> event-A we cannot keep the existing records, since there are gaps
> whenever a return should have been recorded.

Seems to me, the problem is not caused by event rotation.  We need to
calculate a correct filter in the first place - the BRBE driver should
calculate a superset for all filters of events for a session.  Then,
generate branch record based event's specific filter.

> There are a number of cases of that shape given the set of configurable
> filters. In theory it's possible to retain those in some cases, but I
> don't think that the complexity is justified.
> 
> Similarly, whenever kernel branches are recorded it's necessary to drop
> the stale branches whenever branch recording is paused, as there's
> necessarily a blackout period and hence a gap in the records.

If we save BRBE record when a process is switched out and then restore
the record when a process is switched in, should we can keep a decent
branch record for performance profiling?

I understand it might be many things happen in the middle of a task
switching or migration, but it is fine for not recording branches during
the blackout period.  The missed branch records are not very helpful for
forming a flow for a profiled program itself, especially, if we
consider we will optimize userspace program in many cases.

> Do you think that you have a case where losing branches across rotation
> *really* matters?

I agreed that event rotation case might be rare and complex.  Please see
a comment below.

> > For a context-switch case, we need to save and re-inject branch record.
> > BRBE record sticks to a process context, no matter what events have been
> > enabled.
> 
> I had originally wanted to keep per-event records around, but it doesn't
> work in all cases. One reason events get discarded at context-switch
> time is that CPU-bound events can sample branches, and would
> mis-attribute stale userspace branches to the wrong context when
> switching tasks. There are explicit comments about this in
> amd_pmu_brs_sched_task() and intel_pmu_lbr_sched_task().
> 
> Given we discard records when reprogramming events, we *could* try to
> preserve events in some cases, but I suspect that as with the rotation
> case this'll be a lot of complexity for little gain. Note that as we
> discard events when enabling the PMU, we'd throw some task-bound records
> away anyway, and practically the gain would be limited to cpu-bound
> records.
> 
> Do you have a reason why you think we *must* keep events around?

Here I am really concerned are cases when a process is preempted or
migrated.  The driver doesn't save and restore branch records for these
cases, it just invalidates all records when a task is scheduled in.

As a result, if an event overflow is close to context switch, it is
likely to capture incomplete branch records.  For a userspace-only
tracing, it is risk to capture empty branch record after preemption
and migrations.

Thanks,
Leo




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux