On 3/30/23 23:05, Andrii Nakryiko wrote:
On Thu, Mar 30, 2023 at 10:13 AM Lorenz Bauer <lmb@xxxxxxxxxxxxx> wrote:
On Wed, Mar 29, 2023 at 12:56 AM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote:
This basically prevents any forward compatibility. And we either way
just return -EINVAL, which would otherwise be returned from bpf()
syscall anyways.
In your cover letter you make the argument that applications can opt
out of the behaviour, but I think shows that this isn't entirely true.
Apps linking old libbpf won't be able to fix their breakage without
updating libbpf. This is especially annoying when you have to support
multiple old versions where doing this isn't straightforward.
Ok, technically, you are correct. If you somehow managed to get a
bleeding edge kernel, but outdated libbpf, you won't be able to
specify log_level = 8. This is not the only place where too old libbpf
would limit you from using bleeding edge kernel features, though, and
we have to live with that (though try our best to avoid such
dependencies, of course).
But in practice you get the freshest libbpf way before your kernel
becomes the freshest one, so I don't think this is a big deal in
practice.
Hey Andrii,
In an ideal world, yes, but not how it works out in practice. Anything
that ships in a container obviously misses out on distro packages of the
underlying host that are tied to the running kernel. This looks like it's
quickly becoming a majority use case of bpf in the wild, barring of course
Android and systemd.
In practice, we get to deal with rather large version swings in both
directions; users may want to run the latest tools on 4.19, or last
year's stable tool release on 6.1, so the need for both backwards- and
forwards-compatibility is definitely there.
Fortunately, things don't break that often. :) The biggest papercut
we've had recently was the introduction of enum64, which just flat
out requires the latest userspace if kernel btf is needed.
Take this as another plea to make this opt in and instead work
together to make this a default on the lib side. :)
Please, help me understand the arguments against making rotating mode
a default, now that we return -ENOSPC on truncation. In which scenario
this difference matters?
I think there may be a misunderstanding. As you mentioned, there's rarely
anything useful at the start of the log. I think the opt-in behaviour
under discussion here is the lack of ENOSPC when the buffer is undersized.
Userspace should set a flag if it supports receiving a partial log without
expecting ENOSPC.
I've lost track of the exact changes that are now on the table with your
second patch set floating around. The way I understand it, there are
multiple isolated behavioural changes, so let's discuss them separately:
- Log will include the tail instead of the head. This is a good change, no
argument there, and personally I wouldn't bother with a flag until
someone complains. Old userspace is (worst case) already used to retrying
with bigger buffers until ENOSPC is gone, and (best case) gets a few
pages
of actually useful log if it doesn't retry.
- log_size_actual: if populated by the kernel, userspace can do a perfect
retry with the correct buffer size. Userspace will naturally be able to
pick this up when their bpf_attr gets updated. No opt-in/flags needed
because not a breaking change.
- No more ENOSPC: this needs to be opt-in, as old userspace relies on ENOSPC
to drive the retry loop. If the kernel no longer returns ENOSPC by
default,
userspace will think it received a full log and won't be able to detect
truncation if it doesn't yet know about the log_size_actual field.
From what I gather, we're also in agreement on this. Idea for a flag
name:
BPF_LOG_PARTIAL?
Hope this can move the discussion forward, it looked to me like we were just
talking past each other. Thanks for addressing our feedback so far!
1. If there is no truncation and the user provides a big enough buffer
(which my second patch set makes it even easier to do for libraries),
there is no difference, they get identical log contents and behavior.
2. If there was truncation, in both cases we get -ENOSPC. The contents
will differ. In one case we get the beginning of a long log with no
details about what actually caused the failure (useless in pretty much
any circumstances) versus you get the last N bytes of log, all the way
to actual error and some history leading towards it. Which is what we
used to debug and understand verification issues.
What is the situation where the beginning of the log is preferable? I
had exactly one case where I actually wanted the beginning of the log,
that was when I was debugging some bug in the verifier when
implementing open-coded iterators. This bug was happening early and
causing an infinite loop, so I wanted to see the first few pages of
the output to catch how it all started. But that was a development bug
of a tricky feature, definitely not something we expect for end users
to deal with. And it was literally *once* that I needed this.
Why are we fighting to preserve this much less useful behavior as a
default, if there is no reduction of functionality for end-users?
Library writers have full access to union bpf_attr and can opt-out
easily (though again, why?). Normal end users will never have to ask
for BPF_LOG_FIXED behavior. Maybe some advanced tool-building users
will want BPF_LOG_FIXED (like veristat, for example), but then it's in
their best interest to have fresh enough libbpf anyways.
So instead of "I want X over Y", let's discuss "*why* X is better than Y"?
Apps linking old libbpf won't be able to fix their breakage without
updating libbpf. This is especially annoying when you have to support
What sort of breakage would be there to fix?
Also keep in mind that not all use cases use BPF library's high-level
code that does all this fancy log buf manipulations. There are
legitimate cases where tools/applications want direct access to
log_buf, so needing to do extra feature detection to get rotating mode
(but falling back without failing to fixed mode on the old kernel) is
just an unnecessary nuisance.