Re: Why return probes of some syscalls sometimes are not called?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 13 Mar 2017 09:09:47 +0000
"Dorau, Lukasz" <lukasz.dorau@xxxxxxxxx> wrote:

> On Friday, March 10, 2017 11:51 AM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
> > On Thu, 9 Mar 2017 18:00:54 +0100
> > Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
> > 
> > > On Thu, 9 Mar 2017 13:44:38 +0000
> > > "Dorau, Lukasz" <lukasz.dorau@xxxxxxxxx> wrote:
> > >
> > > > Hi,
> > > >
> > > > Could someone explain me why return probes of some syscalls (for example:
> > futex, poll, epoll_wait) sometimes are not called?
> > > >
> > > > It can be reproduced using the following bash script:
> > > > https://gist.github.com/ldorau/c439d9ec7635409a5016c42e3a9121ec
> > > >
> > > > Here are results gathered from 60 seconds test run on kernel 4.9.12 (Fedora
> > 24):
> > > >
> > > > futex:       p 56904    r 5489     (90% did not return (51415))
> > > > poll:        p 43466    r 7703     (82% did not return (35763))
> > > > epoll_wait:  p 73366    r 23551    (67% did not return (49815))
> > > >
> > > > The whole log is attached below.
> > > >
> > > > Lukasz
> > > >
> > > > ---
> > > > # ./test_kprobes.sh 60
> > > >
> > > > Will trace using following kprobe_events:
> > > > r:kprobes/r_futex sys_futex
> > > > p:kprobes/p_futex sys_futex
> > > > r:kprobes/r_poll sys_poll
> > > > p:kprobes/p_poll sys_poll
> > > > r:kprobes/r_epoll_wait sys_epoll_wait
> > > > p:kprobes/p_epoll_wait sys_epoll_wait
> > > > r:kprobes/r_select sys_select
> > > > p:kprobes/p_select sys_select
> > > > r:kprobes/r_fork sys_fork
> > > > p:kprobes/p_fork sys_fork
> > > > r:kprobes/r_vfork sys_vfork
> > > > p:kprobes/p_vfork sys_vfork
> > > > r:kprobes/r_mmap sys_mmap
> > > > p:kprobes/p_mmap sys_mmap
> > > > r:kprobes/r_open sys_open
> > > > p:kprobes/p_open sys_open
> > > > r:kprobes/r_close sys_close
> > > > p:kprobes/p_close sys_close
> > > > r:kprobes/r_write sys_write
> > > > p:kprobes/p_write sys_write
> > > > r:kprobes/r_read sys_read
> > > > p:kprobes/p_read sys_read
> > > >
> > > > Results (60 sec):
> > > > futex:       p 56904    r 5489     (90% did not return (51415))
> > > > poll:        p 43466    r 7703     (82% did not return (35763))
> > > > epoll_wait:  p 73366    r 23551    (67% did not return (49815))
> > > > select:      p 13355    r 13351    (0% did not return (4))
> > > > fork:        p 0        r 0        (OK)
> > > > vfork:       p 0        r 0        (OK)
> > > > mmap:        p 4328     r 4328     (OK)
> > > > open:        p 4579     r 4579     (OK)
> > > > close:       p 7163     r 7163     (OK)
> > > > write:       p 22769    r 22769    (OK)
> > > > read:        p 40014    r 40014    (OK)
> > > >
> > >
> > > OK, I just dumped kprobe_profile after the test,
> > >
> > > futex:       p 7775     r 698      (91% did not return (7077))
> > > poll:        p 18409    r 3523     (80% did not return (14886))
> > > epoll_wait:  p 5122     r 2389     (53% did not return (2733))
> > > select:      p 5411     r 5286     (2% did not return (125))
> > > fork:        p 0        r 0        (OK)
> > > vfork:       p 0        r 0        (OK)
> > > mmap:        p 179      r 179      (OK)
> > > open:        p 921      r 921      (OK)
> > > close:       p 1005     r 1005     (OK)
> > > write:       p 5632     r 5631     (0% did not return (1))
> > > read:        p 10879    r 10878    (0% did not return (1))
> > >   r_futex                                                 1091               0
> > >   p_futex                                                 8986               0
> > >   r_poll                                                  4276               0
> > >   p_poll                                                 20610               0
> > >   r_epoll_wait                                            2849               0
> > >   p_epoll_wait                                            5905               0
> > >   r_select                                               10172               0
> > >   p_select                                               10312               0
> > >   r_fork                                                     0               0
> > >   p_fork                                                     0               0
> > >   r_vfork                                                    0               0
> > >   p_vfork                                                    0               0
> > >   r_mmap                                                  2987               0
> > >   p_mmap                                                  2884               0
> > >   r_open                                                  4002               0
> > >   p_open                                                  3855               0
> > >   r_close                                                 4262               0
> > >   p_close                                                 4122               0
> > >   r_write                                                14281               0
> > >   p_write                                                14276               0
> > >   r_read                                                 41494               0
> > >   p_read                                                 41363               0
> > >
> > > OK, there are no miss-counted event. So at least kprobe and kretprobe working
> > correctly.
> > > I also tried to use static tracepoints for that.
> > > https://gist.github.com/mhiramat/04d782deaf110a6b3391e2ecb923325e
> > >
> > > ----
> > > $ sudo bash test_kprobes.sh 10
> > > Disabling all current probes ...
> > > Removing old probe points ...
> > > Adding new probe points ...
> > > Enabling new probe points ...
> > > Clearing old trace log ...
> > >
> > > Tracing for 10 seconds ... done
> > > Saving trace log ... done
> > >
> > > Results (10 sec):
> > > futex:       p 9856     r 1258     (87% did not return (8598))
> > > poll:        p 7531     r 4031     (46% did not return (3500))
> > > epoll_wait:  p 1996     r 1617     (18% did not return (379))
> > > select:      p 273      r 257      (5% did not return (16))
> > > Results (10 sec):
> > > futex:       p 9851     r 9851     (OK)
> > > poll:        p 7531     r 7531     (OK)
> > > epoll_wait:  p 1996     r 1996     (OK)
> > > select:      p 273      r 273      (OK)
> > > ----
> > > Of course these events are collected in "different places" so
> > > it could be different. But yeah, it seems sys_* function itself
> > > dosen't return but jumps into ret_from_syscall etc.
> > 
> > Please try attached patch (that is my miss that forgot to accumulate missed
> > count of kretprobe).
> > 
> 
> Thanks for the patch. I ran two tests using the following script:
> https://gist.github.com/ldorau/36d87640ff0659a483342d490b66f034
> The results are following:
> 
> 1) The first test - during 60-seconds test the system was almost idle
>    (I did not touch the keyboard or mouse)
> 
> KProbes results (60 sec):
> futex:       p 1373     r 422      (69% did not return (951))
> poll:        p 3304     r 1575     (52% did not return (1729))
> epoll_wait:  p 1062     r 943      (11% did not return (119))
> select:      p 5978     r 5977     (0% did not return (1))
> mmap:        p 264      r 264      (OK)
> open:        p 1056     r 1056     (OK)
> close:       p 1198     r 1198     (OK)
> write:       p 4786     r 4786     (OK)
> read:        p 8939     r 8939     (OK)
> exit:        p 1        r 0    
> fork:        p 0        r 0        (OK)
> vfork:       p 0        r 0        (OK)
> clone:       p 5        r 5        (OK)
> execve:      p 4        r 4        (OK)
> 
> Tracepoints results (60 sec):
> futex:       p 1372     r 1372     (OK)
> poll:        p 3304     r 3303     (0% did not return (1))
> epoll_wait:  p 1062     r 1062     (OK)
> select:      p 5978     r 5978     (OK)
> mmap:        p 264      r 264      (OK)
> open:        p 1056     r 1056     (OK)
> close:       p 1198     r 1198     (OK)
> write:       p 4786     r 4786     (OK)
> read:        p 8939     r 8940     (OK)
> exit:        p 1        r 0    
> 
> 2) The second test - during 60-seconds test I was normally using the system -
>    - I launched an internet browser, edited a file...
> 
> KProbes results (60 sec):
> futex:       p 35394    r 689      (98% did not return (34705))
> poll:        p 23449    r 896      (96% did not return (22553))
> epoll_wait:  p 43927    r 4973     (88% did not return (38954))
> select:      p 4874     r 4873     (0% did not return (1))
> mmap:        p 3165     r 3165     (OK)
> open:        p 3680     r 3678     (0% did not return (2))
> close:       p 6704     r 6704     (OK)
> write:       p 11769    r 11769    (OK)
> read:        p 22645    r 22647    (OK)
> exit:        p 30       r 0        
> fork:        p 0        r 0        (OK)
> vfork:       p 0        r 0        (OK)
> clone:       p 170      r 170      (OK)
> execve:      p 2        r 2        (OK)
> 
> Tracepoints results (60 sec):
> futex:       p 35359    r 35268    (0% did not return (91))
> poll:        p 23448    r 23444    (0% did not return (4))
> epoll_wait:  p 43926    r 43909    (0% did not return (17))
> select:      p 4874     r 4874     (OK)
> mmap:        p 3165     r 3165     (OK)
> open:        p 3680     r 3827     (OK)
> close:       p 6704     r 6704     (OK)
> write:       p 11769    r 11770    (OK)
> read:        p 22645    r 22648    (OK)
> exit:        p 30       r 0        
> 

I see. The patch was to identify the reason why the return event was
lost. Not for fixing the issue.
To fix this issue, I need some more work.

Thank you,

> Lukasz
> 
> 
> 
> > I've focused on futex and it showed there were some "misses" of kretprobe.
> > 
> > ----
> > Disabling all current probes ...
> > Removing old probe points ...
> > Adding new probe points ...
> > Enabling new probe points ...
> > Clearing old trace log ...
> > 
> > Tracing for 10 seconds ... done
> > Saving trace log ... done
> > 
> > Results (10 sec):
> > futex:       p 190      r 74       (61% did not return (116))
> > Results (10 sec):
> > futex:       p 190      r 190      (OK)
> > 
> > $ cat /sys/kernel/debug/tracing/kprobe_profile
> >   r_futex                                                  176             127
> >   p_futex                                                  298               0
> > -----
> > 
> > Actually, since the kretprobe is not well designed for the usecase
> > probing on the sleep-able functions, it just fails if it invoked
> > more than the number of CPUs in parallel.
> > It is currently a limitation of kretprobe, but I'll solve it.
> > 
> > Thanks,
> > 
> > --
> > Masami Hiramatsu <mhiramat@xxxxxxxxxx>


-- 
Masami Hiramatsu <mhiramat@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-trace-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Development]     [Linux USB Development]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux