Mathieu Poirier <mathieu.poirier@xxxxxxxxxx> writes: > On 22 September 2015 at 08:29, Alexander Shishkin > <alexander.shishkin@xxxxxxxxxxxxxxx> wrote: >> Mathieu Poirier <mathieu.poirier@xxxxxxxxxx> writes: >> >>> +static void etm_event_destroy(struct perf_event *event) >>> +{ >>> + /* switching off the source will also tear down the path */ >>> + etm_event_power_sources(event->cpu, false); >>> +} >>> + >>> +static int etm_event_init(struct perf_event *event) >>> +{ >>> + int ret; >>> + >>> + if (event->attr.type != etm_pmu.type) >>> + return -ENOENT; >>> + >>> + if (event->cpu >= nr_cpu_ids) >>> + return -EINVAL; >>> + >>> + /* only one session at a time */ >>> + if (etm_event_source_enabled(event->cpu)) >>> + return -EBUSY; >> >> Why is this the case? If you were to configure the event in pmu::add() >> and deconfigure it in pmu::del(), like you already do with the buffer >> part, you could handle as many sessions as you want. > > Apologies for the late reply, I was travelling. > > We certainly don't want to have more than once trace session going on > at any given time, especially if the sessions have different > configuration parameters. Moreover doing the tracer configuration as > part of pmu::add() is highly redundant. But why? The whole point of using perf for this is that it does all the tricky context switching for us, all the cross-cpu calling to enable/disable the events etc so that we can run multiple sessions in parallel without having to worry (much) about scheduling. (Aside, of course, from other useful things like sideband events, but that's another topic). >> This can be done in pmu::add(), if you can call directly into >> etm_configure_cpu() or etm_config_enable() so that there's no cross-cpu >> calling in between. > > As per my comment above, reconfiguring the tracers every time it is > about to run is redundant and extensive (etm_configure_cpu() isn't > exactly short), incurring a cost that is likely to be higher than > calling get_online_cpus(). I was actually referring to synchronous smp_function_call*()s that obviously won't work here. But the good news is that they are also redundant. But I don't see anything expensive in configuring etm and etb in pmu::add(), as far as I can tell, it's just a bunch of register writes. If you want to optimize those, you could compare the new context against the previous one and only update registers that need to be updated. The spinlock you also could get rid of, because there won't be any local racing (again, afaict neither ETM nor ETB generate interrupts). That said, one expensive thing is reading out the ETB buffer on every sched out, and that is the real problem, because it slows down the fast path by a loop of arbitrary length reading out hw registers. Iirc, ETBs could be up to 64K? But a TMC-enabled coresight should do much better in this regard. Thanks, -- Alex -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html