Hi Peter, > On Aug 19, 2021, at 11:06 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Thu, Aug 19, 2021 at 04:46:20PM +0000, Song Liu wrote: >>> void perf_inject_event(struct perf_event *event, struct pt_regs *regs) >>> { >>> struct perf_sample_data data; >>> struct pmu *pmu = event->pmu; >>> unsigned long flags; >>> >>> local_irq_save(flags); >>> perf_pmu_disable(pmu); >>> >>> perf_sample_data_init(&data, 0, 0); >>> /* >>> * XXX or a variant with more _ that starts at the overflow >>> * handler... >>> */ >>> __perf_event_overflow(event, 0, &data, regs); >>> >>> perf_pmu_enable(pmu); >>> local_irq_restore(flags); >>> } >>> >>> But please consider carefully, I haven't... >> >> Hmm... This is a little weird to me. >> IIUC, we need to call perf_inject_event() after the software event, say >> a kretprobe, triggers. So it gonna look like: >> >> 1. kretprobe trigger; >> 2. handler calls perf_inject_event(); >> 3. PMI kicks in, and saves LBR; > > This doesn't actually happen. I overlooked the fact that we need the PMI > to fill out @data for us. > >> 4. after the PMI, consumer of LBR uses the saved data; > > Normal overflow handler will have data->br_stack set, but I now realize > that the 'psuedo' code above will not get that. We need to somehow get > the arch bits involved; again :/ > >> However, given perf_inject_event() disables PMU, we can just save the LBR >> right there? And it should be a lot easier? Something like: >> >> 1. kretprobe triggers; >> 2. handler calls perf_snapshot_lbr(); >> 2.1 perf_pmu_disable(pmu); >> 2.2 saves LBR >> 2.3 perf_pmu_enable(pmu); >> 3. consumer of LBR uses the saved data; >> >> What is the downside of this approach? > > It would be perf_snapshot_branch_stack() and would require a new > (optional) pmu::method to set up the branch stack. I guess it would look like: diff --git i/include/linux/perf_event.h w/include/linux/perf_event.h index fe156a8170aa3..af379b7f18050 100644 --- i/include/linux/perf_event.h +++ w/include/linux/perf_event.h @@ -514,6 +514,9 @@ struct pmu { * Check period value for PERF_EVENT_IOC_PERIOD ioctl. */ int (*check_period) (struct perf_event *event, u64 value); /* optional */ + + int (*snapshot_branch_stack) (struct perf_event *event, /* TBD, maybe struct + perf_output_handle? */); }; enum perf_addr_filter_action_t { diff --git i/kernel/events/core.c w/kernel/events/core.c index 2d1e63dd97f23..14aa5f7bccf1f 100644 --- i/kernel/events/core.c +++ w/kernel/events/core.c @@ -1207,6 +1207,19 @@ void perf_pmu_enable(struct pmu *pmu) pmu->pmu_enable(pmu); } +int perf_snapshot_branch_stack(struct perf_event *event) +{ + struct pmu *pmu = event->pmu; + int ret; + + if (!pmu->snapshot_branch_stack) + return -EOPNOTSUPP; + perf_pmu_disable(pmu); + ret = pmu->snapshot_branch_stack(event, ...); + perf_pmu_enable(pmu); + return 0; +} + static DEFINE_PER_CPU(struct list_head, active_ctx_list); > > And if we're going to be adding new pmu::methods then I figure one that > does the whole sample state might be more useful. What do you mean by "whole sample state"? To integrate with exiting perf_sample_data, like perf_output_sample()? Thanks, Song