Libbpf is the important part of the BPF ecosystem and it defines modern ways to build and run BPF applications. As such, it’s crucial that it is well-tested, reliable, and seamlessly works across multiple kernels. Until recently, the only testing that was performed were BPF selftests, run manually by BPF maintainers against the bleeding edge versions of kernel. As diligent as maintainers are, this setup is not perfect, requiring a lot of manual work, and could still miss regressions and bugs due to kernel and environment differences. Catching regressions on old kernels was especially ominous leading to real problems in production at Facebook. This seemed like a problem that needed automation. We took the idea of our internal VMTEST framework, which allows to run application integration tests against a range of kernels to catch problems, and applied it to open-source Github mirror [0] of libbpf. We built upon Omar Sandoval’s <osandov@xxxxxx> initial implementation for his drgn tool [1] and adapted it to libbpf needs. It saved many hours of tinkering with generic qemu/Linux image setup! Julia Kartseva <hex@xxxxxx> spent lots of time and efforts on bringing this workflow to libbpf and making process robust and maintainable. Now, with each change to libbpf, we’ll pull and compile the latest kernel and the latest BPF selftests using libbpf with patches to be tested. Next, a VM with that kernel will start and will run a battery of tests (test_progs, test_verifier, and test_maps), verifying that both libbpf and the kernel are still working as expected. Further, to verify libbpf didn’t regress on older kernels, we’ll download a set of older kernels and will perform a supported subset of tests against each of those kernels. This gives us confidence that no matter how bleeding-edge libbpf library you use, it will still work fine across all kernels. Check out a typical Travis CI test run [2] to get a better idea. You can also see an annotated list [3] of blacklisted tests for older kernel. # Why does this matter? - It’s all about confidence when making BPF changes and about maintaining user trust. Automated, repeatable testing on **every** change to libbpf is crucial for allowing BPF developers to move fast and iterate quickly, while ensuring there is no inadvertent breakage of BPF applications. The more libbpf is integrated into critical applications (systemd, iproute2, bpftool, BCC tools, as well as multitude of internal apps across private companies), the more important this becomes. - Well-tested and maintained libbpf Github mirror (as opposed to building from kernel sources) as a single source of truth is important for package maintainers to ensure consistent libbpf versioning across different Linux distributions. This results in better user experience overall and everyone wins from this consistency. - This is also a good base for a more general kernel testing, given that this test setup exercises not just libbpf, but the kernel itself as well. With a bit more automation, it is possible to proactively apply upstream patches and test kernel changes, saving tons of BPF maintainers time and speeding up the patch review process. In a short time we’ve had this running, this setup already caught kernel, libbpf, and selftests bugs (and undoubtedly will catch more): - BPF trampoline assembly bug [4]; - Kprobe tests triggering bug [5]; - Test cleanup crashes [6]; - Tests flakiness [7]; - Quite a few libbpf-specific problems we’ve never got to track explicitly... [0] https://github.com/libbpf/libbpf [1] https://github.com/osandov/drgn [2] https://travis-ci.org/github/libbpf/libbpf/builds/663674948 [3] https://github.com/libbpf/libbpf/blob/master/travis-ci/vmtest/configs/blacklist/BLACKLIST-5.5.0 [4] https://lore.kernel.org/netdev/20200311003906.3643037-1-ast@xxxxxxxxxx/ [5] https://patchwork.ozlabs.org/patch/1254743/ [6] https://lore.kernel.org/netdev/20200220230546.769250-1-andriin@xxxxxx/ [7] https://lore.kernel.org/bpf/20200314024855.ugbvrmqkfq7kao75@xxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#ma733d8e9840d9f91ce20d1143a429aa0d6650959