Re: Tracing NVMe Driver with BPF missing events

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 24, 2022 at 9:12 AM John Mazzie <john.p.mazzie@xxxxxxxxx> wrote:
>
> After thinking about this more, maybe it's more to do with the
> interrupt handler for nvme_complete_rq.
>
> In this situation, sometimes the handler for nvme_setup_cmd could be
> interrupted to handle nvme_complete_rq and in this situation the
> nvme_complete_rq handler wouldn't run. because the nvme_setup_cmd
> handler is not complete.
>
> Is this understanding correct?

Yes.

> I'm assuming there is no real
> workaround for this situation, so we may just need to accept some
> missed events.

Try using fentry/fexit programs instead. They use different reentrancy
protection which is at a per-program level.

>
> John
>
> On Sat, May 21, 2022 at 11:52 AM John Mazzie <john.p.mazzie@xxxxxxxxx> wrote:
> >
> > In this case is a BPF program the individual handler of a tracepoint,
> > or in my context, a BPF program my compiled program that traces both
> > tracepoints? We aren't running any other BPF tracing during these
> > tests besides my program counting these 2 tracepoints.
> >
> > In my program I have 2 handlers, one for
> > tracepoint:nvme:nvme_setup_cmd and another for
> > tracepoint:nvme:nvme_complete_rq. I've created a PERCPU_HASH map for
> > each handler (unique map for each) to use that keeps track of each
> > time the handler is invoked. The only thing that handler is doing in
> > each case is incrementing the count value in the map. Though I do
> > filter by device on each tracepoint. If I comment out the
> > nvme_setup_cmd code the nvme_complete_rq does get the correct count.
> >
> > The user side of my program just prints the values for each of these
> > maps on a 10 second increment.
> >
> > I can share my code to make this easier, is it preferred that I upload
> > my code to github and share the link in this thread?
> >
> > I agree that your suggestion could be my issue, but I just want to
> > make sure we're on the same page since I'm less familiar with the
> > internals of BPF.
> >
> > Thanks,
> > John
> >
> > On Fri, May 20, 2022 at 7:10 PM Andrii Nakryiko
> > <andrii.nakryiko@xxxxxxxxx> wrote:
> > >
> > > On Wed, May 18, 2022 at 2:35 PM John Mazzie <john.p.mazzie@xxxxxxxxx> wrote:
> > > >
> > > > My group at Micron is using BPF and love the tracing capabilities it
> > > > provides. We are mainly focused on the storage subsystem and BPF has
> > > > been really helpful in understanding how the storage subsystem
> > > > interacts with our drives while running applications.
> > > >
> > > > In the process of developing a tool using BPF to trace the nvme
> > > > driver, we ran into an issue with some missing events. I wanted to
> > > > check to see if this is possibly a bug/limitation that I'm hitting or
> > > > if it's expected behavior with heavy tracing. We are trying to trace 2
> > > > trace points (nvme_setup_cmd and nvme_complete_rq) around 1M times a
> > > > second.
> > > > We noticed if we just trace one of the two, we see all the expected
> > > > events, but if we trace both at the same time, the nvme_complete_rq
> > >
> > > kprobe programs have per-CPU reentrancy protection. That is, if some
> > > BPF kprobe/tracepoint program is running and something happens (e.g.,
> > > BPF program calls some kernel function that has another BPF program
> > > attached to it, or preemption happens and another BPF program is
> > > supposed to run) that would trigger another BPF program, then that
> > > nested BPF program invocation will be skipped.
> > >
> > > This might be what happens in your case.
> > >
> > > > misses events. I am using two different percpu_hash maps to count both
> > > > events. One for setup and another for complete. My expectation was
> > > > that tracing these events would affect performance, somewhat, but not
> > > > miss events. Ultimately the tool would be used to trace nvme latencies
> > > > at the driver level by device and process.
> > > >
> > > > My tool was developed using libbpf v0.7, and I've tested on Rocky
> > > > Linux 8.5 (Kernel 4.18.0), Ubuntu 20.04 (Kernel 5.4) and Fedora 36
> > > > (Kernel 5.17.6) with the same results.
> > > >
> > > > Thanks,
> > > > John Mazzie
> > > > Principal Storage Solutions Engineer
> > > > Micron Technology, Inc.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux