Re: [PATCH bpf-next 3/6] bpf: switch BPF verifier log to be a rotating log by default

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andrii!

This is a great idea. Preserving the end of the log is the right thing
to do, and I agree that it would've been nice to do this from the
start. Some thoughts on the approach below, based on a discussion I
had with Robin and Timo.

On Fri, Mar 17, 2023 at 11:13 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> Libbpf by default uses 16MB right now, I didn't touch that part yet,
> but it might be good to reduce this size if the kernel supports this
> new behavior. Go library seems to be using 64KB, which probably would
> be adequate for a lot of cases, but yeah, maybe Go library would like
> to use slightly bigger default if rotating log behavior is supported
> by kernel?

cilium/ebpf relies on ENOSPC to figure out if the verifier log has
been truncated, and makes that information available in the API via
VerifierError.Truncated [0]. This happens so that cilium can retry the
program load with successively larger log buffers to capture the full
verifier log. The full log is very valuable to us since we don't
operate on our own infrastructure, so reproducing an error that a
customer has on our own hardware is not always straightforward.
Instead we include the full log in sysdumps. We want to preserve this
as much as possible.

Arguably 64KB might already be on the large side! The library
currently does the "execute PROG_LOAD with log disabled first, get
debug output later" trick (which libbpf also does I think?). It'd be
nice if we could right-size the buffer on the retry.

> Alternative would be to make this rotating behavior opt-in, but that
> would require active actions by multiple libraries and applications to
> actively detect support. Given the rotating behavior seems like a good
> default behavior I wish we had from the very beginning, I went for
> making it default.

Given the above, taking away ENOSPC is problematic because an old
cilium/ebpf on a new kernel doesn't know how to get the full log. I
think it's better to make this opt in, since it also makes it easier
to reason about from user space. Working with UAPI that changes
behaviour based on kernel version is harder than doing feature
probing.

I do agree that ENOSPC is not a great API however and that it doesn't
make sense for the rotated log to return it. It's a really big hammer
that doesn't actually give us a key bit of information: how big should
the buffer have been to obtain the full log. To recap, cilium/ebpf
needs to know when the log was truncated and would like to know how
big the buffer should be. How about the following:

- To avoid breaking old libs on new kernels we make the behaviour opt
in via BPF_LOG_TRUNCATE_HEAD or similar.
- We add a field log_size_full to bpf_attr for PROG_LOAD. In the
kernel we populate it with the buffer size that is required to
retrieve an untruncated log for the given flags. log_size_full must be
0 when entering the syscall.
- If BPF_LOG_TRUNCATE_HEAD is specified, we enable rotation and never
return ENOSPC.

>From the user space side, we can use the API as follows:

- If errno = ENOSPC or log_size < log_size_full we can deduce that the
log was truncated.
- If log_size_full > 0, we can use it to right-size the log buffer and
retry only once instead of the increase-buf-size-and-retry loop.

FWIW, BTF_LOAD would benefit from the same improvements!

Best
Lorenz

0: https://pkg.go.dev/github.com/cilium/ebpf@v0.10.0/internal#VerifierError




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux