Re: [LSF/MM/BPF TOPIC] time to reconsider tracepoints in the vfs?

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 16 Jan 2025 13:43:39 -0800

On Thu, Jan 16, 2025 at 1:18 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Thu, Jan 16, 2025 at 07:49:49AM -0500, Theodore Ts'o wrote:
> > Historically, we have avoided adding tracepoints to the VFS because of
> > concerns that tracepoints would be considered a userspace-level
> > interface, and would therefore potentially constrain our ability to
> > improve an interface which has been extremely performance critical.
>
> Yes, the lack of tracepoints in the VFS is a fairly significant
> issue when it comes to runtime debugging of production systems...
>
> > I'd like to discuss whether in 2025, it's time to reconsider our
> > reticence in adding tracepoints in the VFS layer.  First, while there
> > has been a single incident of a tracepoint being used by programs that
> > were distributed far and wide (powertop) such that we had to revert a
> > change to a tracepoint that broke it --- that was ***14** years ago,
> > in 2011.
>
> Yes, that was a big mistake in multiple ways. Firstly, the app using
> a tracepoint in this way. The second mistake was the response that
> "tracepoints should be stable API" based on the abuse of a single
> tracepoint.
>
> We had extensive tracepoint coverage in subsystems *before* this
> happened. In XFS, we had already converted hundreds of existing
> debug-build-only tracing calls to use tracepoints based on the
> understanding that tracepoints were *not* considered stable user
> interfaces.
>
> The fact that existing subsystem tracepoints already exposed the
> internal implementation of objects like struct inode, struct file,
> superblocks, etc simply wasn't considered when tracepoints were
> declared "stable".
>
> The fact is that it is simply not possible to maintain any sort of
> useful introspection with the tracepoint infrastructure without
> exposing internal implementation details that can change from kernel
> to kernel.
>
> > Across multiple other subsystems, many of
> > which have added an extensive number of tracepoints, there has been
> > only a single problem in over a decade, so I'd like to suggest that
> > this concern may have not have been as serious as we had first
> > thought.
>
> Yes, these subsystems still operate under the "tracepoints are not
> stable" understanding.  The reality is that userspace has *never*
> been able to rely on tracepoints being stable across multiple kernel
> releases, regardless of what anyone else (including Linus) says is
> the policy.
>
> > I'd like to propose that we experiment with adding tracepoints in
> > early 2025, so that at the end of the year the year-end 2025 LTS
> > kernels will have tracepoints that we are confident will be fit for
> > purpose for BPF users.
>
> Why does BPF even need tracepoints? BPF code should be using kprobes
> to hook into the running kernel to monitor it, yes?

This is way more nuanced than that. There are at least a few
advantages that tracepoints have over kprobes, even if both are usable
(and useful) with BPF:

  - kprobes very often get inlined by the compiler (especially if they
are static functions), making them unusable (and kprobing inlined
functions comes with a huge set of additional hurdles and problems, we
don't have to go into details here). This is probably the biggest
issue in practice for which tracepoints are way-way better.

  - raw performance: tracepoints are *significantly* faster than
kprobes (like 2-3x less overhead, [0])

  - relative stability of tracepoints in terms of naming, semantics,
arguments. While not stable APIs, tracepoints are "more stable" in
practice due to more deliberate and strategic placement (usually), so
they tend to get renamed or changed much less frequently.

So, as far as BPF is concerned, tracepoints are still preferable to
kprobes for something like VFS, and just because BPF can be used with
kprobes easily doesn't mean BPF users don't need useful tracepoints.

  [0] https://patchwork.kernel.org/project/netdevbpf/patch/20240326162151.3981687-3-andrii@xxxxxxxxxx/

>
> Regardless of BPF, why not just send patches to add the tracepoints
> you want?
>
> -Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>