On Wed, Apr 3, 2024 at 1:52 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > On 4/2/24 10:00 AM, Kui-Feng Lee wrote: > > > > > > > > On 4/1/24 18:43, John Fastabend wrote: > >> Kui-Feng Lee wrote: > >>> The verifier in the kernel checks the signatures of struct_ops > >>> operators. Libbpf should not verify it in order to allow flexibility in > > This description probably is not accurate. iirc, the verifier does not check the > function signature either. The verifier rejects only when the struct_ops prog > tries to access something invalid. e.g. reading a function argument that does > not exist in the running kernel. > > >>> loading different implementations of an operator with different signatures > >>> to try to comply with the kernel, even if the signature defined in the BPF > >>> programs does not match with the implementations and the kernel. > > >>> This feature enables user space applications to manage the variations > >>> between different versions of the kernel by attempting various > >>> implementations of an operator. > >> > >> What is the utility of this? I'm missing what difference it would be > >> if libbpf rejected vs kernel rejecting it? For backwards compat the > >> kernel will fail or libbpf might throw an error and user will have to > >> fixup signature regardless right? Why not get the error as early as > >> possible. > > > > The check described here is that libbpf compares BTF types of functions > > and function pointers in struct_ops types in BPF programs, which may > > differ from kernel definitions. > > > > A scenario here is a struct_ops type that includes an operator op_A with > > different versions depending on the kernel. All other fields in the > > struct_ops type have the same types. The application has only one > > definition for this struct_ops type, but the implementation of op_A is > > done separately for each version. > > > > The application can try variations by assigning implementations to the > > op_A field until one is accepted by the kernel if libbpf doesn’t enforce > > It probably would be clearer if the test actually does the retry. e.g. Try to > load a struct_ops prog which reads an extra arg that is not supported by the > running kernel and gets rejected by verifier. Then assigns an older struct_ops > prog to the skel->struct_ops...->fn and loads successfully by the verifier. > This is actually a discouraged practice. In practice in production user-space logic does feature detection (using BTF or whatever else necessary) and then decides on specific BPF program implementation. So I wouldn't overstress this approach (trial-and-error one) in tests, it's a bad and sloppy practice.