On Mon, 24 Jan 2022, Andrii Nakryiko wrote: > On Mon, Jan 24, 2022 at 6:14 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > > > > I think for users it'd be good to clarify what the overheads are. If I > > want to see malloc()s in a particular process, say I specify the libc > > path along with the process ID I'm interested in. This adds the > > breakpoint to libc and will - as far as I understand it - trigger > > breakpoints system-wide which are then filtered out by uprobe perf handling > > for the specific process specified. That's pretty expensive > > performance-wise, so we should probably try and give users options to > > limit the cost in cases where they don't want to incur system-wide > > overheads. I've been investigating adding support for instrumenting shared > > library calls _within_ programs by placing the breakpoint on the procedure > > linking table call associated with the function. As this is local to the > > You mean to patch PLT stubs ([0])? Yep, the .plt table, as shown by "objdump -D -j .plt <program>" Disassembly of section .plt: 000000000040d020 <.plt>: 40d020: ff 35 e2 5f 4b 00 pushq 0x4b5fe2(%rip) # 8c3008 < _GLOBAL_OFFSET_TABLE_+0x8> 40d026: ff 25 e4 5f 4b 00 jmpq *0x4b5fe4(%rip) # 8c3010 <_GLOBAL_OFFSET_TABLE_+0x10> 40d02c: 0f 1f 40 00 nopl 0x0(%rax) 000000000040d030 <inet_ntop@plt>: 40d030: ff 25 e2 5f 4b 00 jmpq *0x4b5fe2(%rip) # 8c3018 <inet_ntop@GLIBC_2.2.5> 40d036: 68 00 00 00 00 pushq $0x0 40d03b: e9 e0 ff ff ff jmpq 40d020 <.plt> In the case of inet_ntop() the address would be 40d030 - which we then do the relative address calcuation on, giving the address to be uprobe'd as 0xd030. > One concern with that is (besides > making sure that pt_regs still have exactly the same semantics and > stuff) that uprobes are much faster when patching nop instructions (if > the library was compiled with nop "preambles". Do you know if @plt > entries can be compiled with nops as well? I haven't found any way to do that unfortunately. > The difference in > performance is more than 2x from my non-scientific testing recently. > So it can be a pretty big difference. > Interesting! There may be a cleaner way to achieve the goal of tracing shared library calls in the local binary, but I'm not seeing an alternate approach yet unfortunately. To me the key thing is to ensure we have an alternative to globally tracing in libc. I'll send out the v2 addressing the things you found in the RFC shortly (and that uses the .plt instrumentation approach). Thanks! Alan