Re: [PATCH v2 bpf-next 1/2] libbpf: Support triple-underscore flavors for kfunc relocation

David Marchevsky <david.marchevsky@xxxxxxxxx> · Wed, 16 Aug 2023 19:40:35 -0400

On 8/16/23 3:28 PM, David Vernet wrote:
> On Wed, Aug 16, 2023 at 03:01:10PM -0400, David Marchevsky wrote:
>> On 8/16/23 1:37 PM, David Vernet wrote:
>>> On Wed, Aug 16, 2023 at 09:58:12AM -0700, Dave Marchevsky wrote:
>>>> The function signature of kfuncs can change at any time due to their
>>>> intentional lack of stability guarantees. As kfuncs become more widely
>>>> used, BPF program writers will need facilities to support calling
>>>> different versions of a kfunc from a single BPF object. Consider this
>>>> simplified example based on a real scenario we ran into at Meta:
>>>>
>>>>   /* initial kfunc signature */
>>>>   int some_kfunc(void *ptr)
>>>>
>>>>   /* Oops, we need to add some flag to modify behavior. No problem,
>>>>     change the kfunc. flags = 0 retains original behavior */
>>>>   int some_kfunc(void *ptr, long flags)
>>>>
>>>> If the initial version of the kfunc is deployed on some portion of the
>>>> fleet and the new version on the rest, a fleetwide service that uses
>>>> some_kfunc will currently need to load different BPF programs depending
>>>> on which some_kfunc is available.
>>>>
>>>> Luckily CO-RE provides a facility to solve a very similar problem,
>>>> struct definition changes, by allowing program writers to declare
>>>> my_struct___old and my_struct___new, with ___suffix being considered a
>>>> 'flavor' of the non-suffixed name and being ignored by
>>>> bpf_core_type_exists and similar calls.
>>>>
>>>> This patch extends the 'flavor' facility to the kfunc extern
>>>> relocation process. BPF program writers can now declare
>>>>
>>>>   extern int some_kfunc___old(void *ptr)
>>>>   extern int some_kfunc___new(void *ptr, int flags)
>>>>
>>>> then test which version of the kfunc exists with bpf_ksym_exists.
>>>> Relocation and verifier's dead code elimination will work in concert as
>>>> expected, allowing this pattern:
>>>>
>>>>   if (bpf_ksym_exists(some_kfunc___old))
>>>>     some_kfunc___old(ptr);
>>>>   else
>>>>     some_kfunc___new(ptr, 0);
>>>>
>>>> Changelog:
>>>>
>>>> v1 -> v2: https://lore.kernel.org/bpf/20230811201346.3240403-1-davemarchevsky@xxxxxx/
>>>>   * No need to check obj->externs[i].essent_name before zfree (Jiri)
>>>>   * Use strndup instead of replicating same functionality (Jiri)
>>>>   * Properly handle memory allocation falure (Stanislav)
>>>>
>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx>
>>>> ---
>>>>  tools/lib/bpf/libbpf.c | 20 +++++++++++++++++++-
>>>>  1 file changed, 19 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>>>> index b14a4376a86e..8899abc04b8c 100644
>>>> --- a/tools/lib/bpf/libbpf.c
>>>> +++ b/tools/lib/bpf/libbpf.c
>>>> @@ -550,6 +550,7 @@ struct extern_desc {
>>>>  	int btf_id;
>>>>  	int sec_btf_id;
>>>>  	const char *name;
>>>> +	char *essent_name;
>>>>  	bool is_set;
>>>>  	bool is_weak;
>>>>  	union {
>>>> @@ -3770,6 +3771,7 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
>>>>  	struct extern_desc *ext;
>>>>  	int i, n, off, dummy_var_btf_id;
>>>>  	const char *ext_name, *sec_name;
>>>> +	size_t ext_essent_len;
>>>>  	Elf_Scn *scn;
>>>>  	Elf64_Shdr *sh;
>>>>  
>>>> @@ -3819,6 +3821,14 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
>>>>  		ext->sym_idx = i;
>>>>  		ext->is_weak = ELF64_ST_BIND(sym->st_info) == STB_WEAK;
>>>>  
>>>> +		ext_essent_len = bpf_core_essential_name_len(ext->name);
>>>> +		ext->essent_name = NULL;
>>>> +		if (ext_essent_len != strlen(ext->name)) {
>>>> +			ext->essent_name = strndup(ext->name, ext_essent_len);
>>>> +			if (!ext->essent_name)
>>>> +				return -ENOMEM;
>>>> +		}
>>>> +
>>>>  		ext->sec_btf_id = find_extern_sec_btf_id(obj->btf, ext->btf_id);
>>>>  		if (ext->sec_btf_id <= 0) {
>>>>  			pr_warn("failed to find BTF for extern '%s' [%d] section: %d\n",
>>>> @@ -7624,7 +7634,8 @@ static int bpf_object__resolve_ksym_func_btf_id(struct bpf_object *obj,
>>>>  
>>>>  	local_func_proto_id = ext->ksym.type_id;
>>>>  
>>>> -	kfunc_id = find_ksym_btf_id(obj, ext->name, BTF_KIND_FUNC, &kern_btf, &mod_btf);
>>>> +	kfunc_id = find_ksym_btf_id(obj, ext->essent_name ?: ext->name, BTF_KIND_FUNC, &kern_btf,
>>>> +				    &mod_btf);
>>>>  	if (kfunc_id < 0) {
>>>>  		if (kfunc_id == -ESRCH && ext->is_weak)
>>>>  			return 0;
>>>> @@ -7642,6 +7653,9 @@ static int bpf_object__resolve_ksym_func_btf_id(struct bpf_object *obj,
>>>>  		pr_warn("extern (func ksym) '%s': func_proto [%d] incompatible with %s [%d]\n",
>>>>  			ext->name, local_func_proto_id,
>>>
>>> Should we do ext->essent_name ?: ext->name here or in the below pr's as
>>> well? Hmm, maybe it would be more clear to keep the full name.
>>>
>>
>> Yeah, I agree that the full name should be used in this warning for clarity.
>> So won't change.
>>
>>>>  			mod_btf ? mod_btf->name : "vmlinux", kfunc_proto_id);
>>>> +
>>>> +		if (ext->is_weak)
>>>> +			return 0;
>>>
>>> Could you clarify why we want this check? Don't we want to fail if the
>>> prototype of the actual (essent) symbol we resolve to doesn't match
>>> what's in the BPF prog? If we do want to keep this, should we do the
>>> check above the pr_warn()?
>>>
>>
>> Actually this if-and-return was initially above the pr_warn while I was
>> developing the patch. I moved it down here to confirm via './test_progs -vv'
>> that the pseudo-failure cases in the selftests were going down the codepaths
>> I expected, and left it b/c better to err on the side of too much logging
>> when doing this ___flavor trickery.
> 
> Normally I'd agree, but this is also a pr_warn(), so it goes a bit
> beyond logging IMO. FWIW, I'd vote for erring on the side of matching
> the existing behavior of other __weak special symbol resolution.
> 
> Edit: Saw your other comment below, which I've responded to more
> substantively below as well.
> 

I will respond down there too.

>>
>> In re: "clarify why we want this check?" and subsequent question, IIUC, with an
>> extern decl like
>>
>>   struct task_struct *bpf_task_acquire___one(struct task_struct *task) __ksym __weak;
>>
>> if we removed __weak from the declaration, symbol resolution would happen during
>> compilation + linking, at which point there would be no opportunity to do
>> our ___flavor trickery. But __weak is already used to express "if this kfunc
>> doesn't exist at all, it's not a problem, don't fail loading the program". So
> 
> To clarify -- I wasn't asking why we need to specify __weak, I was
> asking why you added an additional check for __weak on this branch. The
> original check on the find_ksym_btf_id() path made sense to me:
> 
> kfunc_id = find_ksym_btf_id(obj, ext->essent_name ?: ext->name, BTF_KIND_FUNC, &kern_btf,
> 			    &mod_btf);
> if (kfunc_id < 0) {
> 	if (kfunc_id == -ESRCH && ext->is_weak)
> 		return 0;
> 

Ah nice catch, I didn't notice that behavior right up there.
Will match existing behavior.

> If the symbol isn't found, it's weak, so it's OK. This next check is
> saying if the symbol is found BUT it doesn't match the BTF type that the
> kernel expects, that it's OK if it's weak. I wasn't understanding why
> __weak would apply here (usually __weak is intended to mean "it's OK to
> override this symbol with another symbol with the exact same
> declaration", though in practice I don't think symbol resolution works
> that way in every dynamic linker). After thinking about it some more, I
> guess it's necessary to accommodate the case of e.g.
> bpf_task_acquire___three() not matching the BTF signature in the kernel,
> and allowing another symbol like bpf_task_acquire___one() to be resolved
> on another pass?
> 

Yeah precisely, before I added the return 0, any failing
bpf_core_types_are_compat match would cause the file to fail
compilation+link step. If the patch was applied without that
test-and-return, the ___flavor behavior could have only be
used to call the 'correct' func signature matching what's in running kernel
by multiple names.

Re: "usually __weak is intended to mean 'it's OK to override this symbol with
another symbol with the same exact declaration'" - with the caveat that I
didn't really use __weak in "normal" C, most docs that I came across when
googling __weak symbol usage show this pattern:

  if (&some_unresolved_weak_symbol)
    some_unresolved_weak_symbol(arg);

(note that bpf_ksym_exists is essentially !this test + static assert)

So presumably the "optionally use it if defined" scenario is roughly as common
as the "override" scenario. The special ___flavor name logic is the only
thing libbpf C is doing here that's new.

>> as of this version of the code, it's not possible to express "one of
>> bpf_task_acquire___{one,two,three} must resolve, otherwise fail to
>> load" - that check would have to be done at runtime like
>>
>>   if (!(bpf_ksym_exists(bpf_task_acquire___one) ||
>>         bpf_ksym_exists(bpf_task_acquire___two) ||
>>         bpf_ksym_exists(bpf_task_acquire___three)) {
>>     /* communicate failure to userspace runner via global var or something */
>>     return 0;
>>   }
>>
>> Maybe something like BTF tags could be used to group a set of __weak
>> kfunc declarations together such that one (probably _only_ one) of them
>> must resolve at load time. This would obviate the need for such a runtime
>> check without causing compile+link step to fail. But also seems overly
>> complex for now.
> 
> This does sound indeed useful. As explained above, I wasn't intending to
> imply that we didn't need __weak, but regardless this sounds like a nice
> to have at some point down the line.
> 
>> Feels useful to have "incompatible resolution" log message even if it doesn't
>> stop loading process. But because __weak ties ___flavor trickery to "not a 
>> problem if kfunc doesn't exist at all", probably more accurate to make the
>> pr_warn a pr_debug if ___flavor AND ext->is_weak. Adding the logic to do that
>> felt like it would raise more questions than answers to a future reader of the
>> code, so I didn't add it. Now that I'm writing this out, I think it's better
>> to add it along with a comment.
> 
> Yeah, I agree with you -- in my experience, it's common for automation
> to be setup that does nothing other than search for logs with level >=
> warn and raise some alert if any is encountered. I think it's best to
> keep the warn namespace relegated to logging actual error states for
> that reason. So I agree with you that I think this is the proper way
> forward, though IMHO I wouldn't even bother with the pr_debug() here,
> just like we don't with the find_ksym_btf_id() case above. It's fine if
> you want to add it though, I don't feel strongly either way.
> 
> Thanks,
> David