On Tue, Sep 7, 2021 at 12:02 PM Song Liu <songliubraving@xxxxxx> wrote: > > > > > On Sep 3, 2021, at 9:50 AM, Song Liu <songliubraving@xxxxxx> wrote: > > > > > > > >> On Sep 3, 2021, at 1:02 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > >> > >> On Thu, Sep 02, 2021 at 09:57:04AM -0700, Song Liu wrote: > >>> +static int > >>> +intel_pmu_snapshot_branch_stack(struct perf_branch_entry *entries, unsigned int cnt) > >>> +{ > >>> + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); > >>> + > >>> + intel_pmu_disable_all(); > >>> + intel_pmu_lbr_read(); > >>> + cnt = min_t(unsigned int, cnt, x86_pmu.lbr_nr); > >>> + > >>> + memcpy(entries, cpuc->lbr_entries, sizeof(struct perf_branch_entry) * cnt); > >>> + intel_pmu_enable_all(0); > >>> + return cnt; > >>> +} > >> > >> Would something like the below help get rid of that memcpy() ? > >> > >> (compile tested only) > > > > We can get rid of the memcpy. But we will need an extra "size" or "num_entries" > > parameter for intel_pmu_lbr_read. I can add this change in the next version. > > > > This is trickier than I thought. As current lbr_read() function works with > perf_branch_stack, while the BPF helper side uses array of perf_branch_entry. > And the array is passed into the helper by the BPF program. Therefore, to > really get rid of the memcpy, we need to refactor the lbr driver code more. > How about we keep the memcpy for now, and add the optimization later (if we > think it is necessary)? > Sounds good to me! > Thanks, > Song >