On 03/16/2018 05:42 PM, Ravi Bangoria wrote: > > On 03/15/2018 08:19 PM, Oleg Nesterov wrote: >> On 03/13, Ravi Bangoria wrote: >>> For tiny binaries/libraries, different mmap regions points to the >>> same file portion. In such cases, we may increment reference counter >>> multiple times. >> Yes, >> >>> But while de-registration, reference counter will get >>> decremented only by once >> could you explain why this happens? sdt_increment_ref_ctr() and >> sdt_decrement_ref_ctr() look symmetrical, _decrement_ should see >> the same mappings? > Sorry, I thought this happens only for tiny binaries. But that is not the case. > This happens for binary / library of any length. > > Also, it's not a problem with sdt_increment_ref_ctr() / sdt_increment_ref_ctr(). > The problem happens with trace_uprobe_mmap_callback(). > > To illustrate in detail, I'm adding a pr_info() in trace_uprobe_mmap_callback(): > > vaddr = vma_offset_to_vaddr(vma, tu->ref_ctr_offset); > + pr_info("0x%lx-0x%lx : 0x%lx\n", vma->vm_start, vma->vm_end, vaddr); > sdt_update_ref_ctr(vma->vm_mm, vaddr, 1); > > > Ok now, libpython has SDT markers with reference counter: > > # readelf -n /usr/lib64/libpython2.7.so.1.0 | grep -A2 Provider > Provider: python > Name: function__entry > ... Semaphore: 0x00000000002899d8 > > Probing on that marker: > > # cd /sys/kernel/debug/tracing/ > # echo "p:sdt_python/function__entry /usr/lib64/libpython2.7.so.1.0:0x16a4d4(0x2799d8)" > uprobe_events > # echo 1 > events/sdt_python/function__entry/enable > > When I run python: > > # strace -o out python > mmap(NULL, 2738968, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fff92460000 > mmap(0x7fff926a0000, 327680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x230000) = 0x7fff926a0000 > mprotect(0x7fff926a0000, 65536, PROT_READ) = 0 > > The first mmap() maps the whole library into one region. Second mmap() > and third mprotect() split out the whole region into smaller vmas and sets > appropriate protection flags. > > Now, in this case, trace_uprobe_mmap_callback() updates reference counter > twice -- by second mmap() call and by third mprotect() call -- because both > regions contain reference counter offset. This I can verify in dmesg: > > # dmesg | tail > trace_kprobe: 0x7fff926a0000-0x7fff926f0000 : 0x7fff926e99d8 > trace_kprobe: 0x7fff926b0000-0x7fff926f0000 : 0x7fff926e99d8 > > Final vmas of libpython: > > # cat /proc/`pgrep python`/maps | grep libpython > 7fff92460000-7fff926a0000 r-xp 00000000 08:05 403934 /usr/lib64/libpython2.7.so.1.0 > 7fff926a0000-7fff926b0000 r--p 00230000 08:05 403934 /usr/lib64/libpython2.7.so.1.0 > 7fff926b0000-7fff926f0000 rw-p 00240000 08:05 403934 /usr/lib64/libpython2.7.so.1.0 > > > I see similar problem with normal binary as well. I'm using Brendan Gregg's > example[1]: > > # readelf -n /tmp/tick | grep -A2 Provider > Provider: tick > Name: loop2 > ... Semaphore: 0x000000001005003c > > Probing that marker: > > # echo "p:sdt_tick/loop2 /tmp/tick:0x6e4(0x10036)" > uprobe_events > # echo 1 > events/sdt_tick/loop2/enable > > Now when I run the binary > > # /tmp/tick > > load_elf_binary() internally calls mmap() and I see trace_uprobe_mmap_callback() > updating reference counter twice: > > # dmesg | tail > trace_kprobe: 0x10010000-0x10030000 : 0x10020036 > trace_kprobe: 0x10020000-0x10030000 : 0x10020036 > > proc/<pid>/maps of the tick: > > # cat /proc/`pgrep tick`/maps > 10000000-10010000 r-xp 00000000 08:05 1335712 /tmp/tick > 10010000-10020000 r--p 00000000 08:05 1335712 /tmp/tick > 10020000-10030000 rw-p 00010000 08:05 1335712 /tmp/tick > > [1] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506 Also, while de-registration, we look for all existing mms using uprobe_build_mmap_info() and decrement the counter in each of the mm. i.e. we decrement the counter only once. -Ravi