Hi Andrii! This is a great idea. Preserving the end of the log is the right thing to do, and I agree that it would've been nice to do this from the start. Some thoughts on the approach below, based on a discussion I had with Robin and Timo. On Fri, Mar 17, 2023 at 11:13 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > Libbpf by default uses 16MB right now, I didn't touch that part yet, > but it might be good to reduce this size if the kernel supports this > new behavior. Go library seems to be using 64KB, which probably would > be adequate for a lot of cases, but yeah, maybe Go library would like > to use slightly bigger default if rotating log behavior is supported > by kernel? cilium/ebpf relies on ENOSPC to figure out if the verifier log has been truncated, and makes that information available in the API via VerifierError.Truncated [0]. This happens so that cilium can retry the program load with successively larger log buffers to capture the full verifier log. The full log is very valuable to us since we don't operate on our own infrastructure, so reproducing an error that a customer has on our own hardware is not always straightforward. Instead we include the full log in sysdumps. We want to preserve this as much as possible. Arguably 64KB might already be on the large side! The library currently does the "execute PROG_LOAD with log disabled first, get debug output later" trick (which libbpf also does I think?). It'd be nice if we could right-size the buffer on the retry. > Alternative would be to make this rotating behavior opt-in, but that > would require active actions by multiple libraries and applications to > actively detect support. Given the rotating behavior seems like a good > default behavior I wish we had from the very beginning, I went for > making it default. Given the above, taking away ENOSPC is problematic because an old cilium/ebpf on a new kernel doesn't know how to get the full log. I think it's better to make this opt in, since it also makes it easier to reason about from user space. Working with UAPI that changes behaviour based on kernel version is harder than doing feature probing. I do agree that ENOSPC is not a great API however and that it doesn't make sense for the rotated log to return it. It's a really big hammer that doesn't actually give us a key bit of information: how big should the buffer have been to obtain the full log. To recap, cilium/ebpf needs to know when the log was truncated and would like to know how big the buffer should be. How about the following: - To avoid breaking old libs on new kernels we make the behaviour opt in via BPF_LOG_TRUNCATE_HEAD or similar. - We add a field log_size_full to bpf_attr for PROG_LOAD. In the kernel we populate it with the buffer size that is required to retrieve an untruncated log for the given flags. log_size_full must be 0 when entering the syscall. - If BPF_LOG_TRUNCATE_HEAD is specified, we enable rotation and never return ENOSPC. >From the user space side, we can use the API as follows: - If errno = ENOSPC or log_size < log_size_full we can deduce that the log was truncated. - If log_size_full > 0, we can use it to right-size the log buffer and retry only once instead of the increase-buf-size-and-retry loop. FWIW, BTF_LOAD would benefit from the same improvements! Best Lorenz 0: https://pkg.go.dev/github.com/cilium/ebpf@v0.10.0/internal#VerifierError