On Wed, 6 Jun 2018 14:03:37 +0530 Ravi Bangoria <ravi.bangoria@xxxxxxxxxxxxx> wrote: > Why RFC again: > > This series is different from earlier versions[1]. Earlier series > implemented this feature in trace_uprobe while this has implemented > the logic in core uprobe. Few reasons for this: > 1. One of the major reason was the deadlock between uprobe_lock and > mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix > because mm->mmap is not in control of trace_uprobe_mmap() and it has > to take uprobe_lock to loop over trace_uprobe list. More details can > be found at[2]. With this new approach, there are no deadlocks found > so far. > 2. Many of the core uprobe function and data-structures needs to be > exported to make earlier implementation simple. With this new approach, > reference counter logic is been implemented in core uprobe and thus > no need to export anything. I agree with you. Moreover, since uprobe_register/unregister() are exported to modules, this enablement would better be implemented inside uprobe so that all uprobe users benefit from this. > > Description: > > Userspace Statically Defined Tracepoints[3] are dtrace style markers > inside userspace applications. Applications like PostgreSQL, MySQL, > Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc > have these markers embedded in them. These markers are added by developer > at important places in the code. Each marker source expands to a single > nop instruction in the compiled code but there may be additional > overhead for computing the marker arguments which expands to couple of > instructions. In case the overhead is more, execution of it can be > omitted by runtime if() condition when no one is tracing on the marker: > > if (reference_counter > 0) { > Execute marker instructions; > } > > Default value of reference counter is 0. Tracer has to increment the > reference counter before tracing on a marker and decrement it when > done with the tracing. > > Currently, perf tool has limited supports for SDT markers. I.e. it > can not trace markers surrounded by reference counter. Also, it's > not easy to add reference counter logic in userspace tool like perf, > so basic idea for this patchset is to add reference counter logic in > the trace_uprobe infrastructure. Ex,[4] > > # cat tick.c > ... > for (i = 0; i < 100; i++) { > DTRACE_PROBE1(tick, loop1, i); > if (TICK_LOOP2_ENABLED()) { > DTRACE_PROBE1(tick, loop2, i); > } > printf("hi: %d\n", i); > sleep(1); > } > ... > > Here tick:loop1 is marker without reference counter where as tick:loop2 > is surrounded by reference counter condition. > > # perf buildid-cache --add /tmp/tick > # perf probe sdt_tick:loop1 > # perf probe sdt_tick:loop2 > > # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick > hi: 0 > hi: 1 > hi: 2 > ^C > Performance counter stats for '/tmp/tick': > 3 sdt_tick:loop1 > 0 sdt_tick:loop2 > 2.747086086 seconds time elapsed > > Perf failed to record data for tick:loop2. Same experiment with this > patch series: > > # ./perf buildid-cache --add /tmp/tick > # ./perf probe sdt_tick:loop2 > # ./perf stat -e sdt_tick:loop2 /tmp/tick > hi: 0 > hi: 1 > hi: 2 > ^C > Performance counter stats for '/tmp/tick': > 3 sdt_tick:loop2 > 2.561851452 seconds time elapsed > > > Note: > - 'reference counter' is called as 'semaphore' in original Dtrace > (or Systemtap, bcc and even in ELF) documentation and code. But the > term 'semaphore' is misleading in this context. This is just a counter > used to hold number of tracers tracing on a marker. This is not haeally > used for any synchronization. So we are referring it as 'reference > counter' in kernel / perf code. +1. I like this "reference counter" term :) > - This patches still has one issue. If there are multiple instances of > same application running and user wants to trace any particular > instance, trace_uprobe is updating reference counter in all instances. > This is not a problem on user side because instruction is not replaced > with trap/int3 and thus user will only see samples from his interested > process. But still this is more of a correctness issue. I'm working on > a fix for this. Hmm, it sounds like not a correctness issue, but there maybe a performace tradeoff. Tracing one particulear instance, other instances also will get a performance loss (Only if the parameter preparation block is heavy, because the heaviest part of probing - trap/int3 and recording data - isn't executed.) BTW, why this happens? I thought the refcounter part is just a data which is not shared among processes... Thank you, > > [1] https://lkml.org/lkml/2018/4/17/23 > [2] https://lkml.org/lkml/2018/5/25/111 > [3] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation > [4] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506 > > Ravi Bangoria (7): > Uprobes: Simplify uprobe_register() body > Uprobes: Support SDT markers having reference count (semaphore) > Uprobes/sdt: Fix multiple update of same reference counter > trace_uprobe/sdt: Prevent multiple reference counter for same uprobe > Uprobes/sdt: Prevent multiple reference counter for same uprobe > Uprobes/sdt: Document about reference counter > perf probe: Support SDT markers having reference counter (semaphore) > > Documentation/trace/uprobetracer.rst | 16 +- > include/linux/uprobes.h | 5 + > kernel/events/uprobes.c | 502 +++++++++++++++++++++++++++++++---- > kernel/trace/trace.c | 2 +- > kernel/trace/trace_uprobe.c | 74 +++++- > tools/perf/util/probe-event.c | 39 ++- > tools/perf/util/probe-event.h | 1 + > tools/perf/util/probe-file.c | 34 ++- > tools/perf/util/probe-file.h | 1 + > tools/perf/util/symbol-elf.c | 46 +++- > tools/perf/util/symbol.h | 7 + > 11 files changed, 643 insertions(+), 84 deletions(-) > > -- > 1.8.3.1 > -- Masami Hiramatsu <mhiramat@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html