Re: [PATCH bpf-next 5/6] libbpf: support libbpf-provided extern variables

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/19/19 7:42 AM, Andrii Nakryiko wrote:
> On Mon, Nov 18, 2019 at 10:57 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
>>
>> On Mon, Nov 18, 2019 at 7:21 PM Alexei Starovoitov
>> <alexei.starovoitov@xxxxxxxxx> wrote:
>>>
>>> On Sat, Nov 16, 2019 at 11:08:06PM -0800, Andrii Nakryiko wrote:
>>>> Add support for extern variables, provided to BPF program by libbpf. Currently
>>>> the following extern variables are supported:
>>>>    - LINUX_KERNEL_VERSION; version of a kernel in which BPF program is
>>>>      executing, follows KERNEL_VERSION() macro convention;
>>>>    - CONFIG_xxx values; a set of values of actual kernel config. Tristate,
>>>>      boolean, and integer values are supported. Strings are not supported at
>>>>      the moment.
>>>>
>>>> All values are represented as 64-bit integers, with the follow value encoding:
>>>>    - for boolean values, y is 1, n or missing value is 0;
>>>>    - for tristate values, y is 1, m is 2, n or missing value is 0;
>>>>    - for integers, the values is 64-bit integer, sign-extended, if negative; if
>>>>      config value is missing, it's represented as 0, which makes explicit 0 and
>>>>      missing config value indistinguishable. If this will turn out to be
>>>>      a problem in practice, we'll need to deal with it somehow.
>>>
>>> I read that statement as there is no extensibility for such api.
>>
>> What do you mean exactly?
>>
>> Are you worried about 0 vs undefined case? I don't think it's going to
>> be a problem in practice. Looking at my .config, I see that integer
>> config values set to their default values are still explicitly
>> specified with those values. E.g.,
>>
>> CONFIG_HZ_1000=y
>> CONFIG_HZ=1000
>>
>> CONFIG_HZ default is 1000, if CONFIG_HZ_1000==y, but still I see it
>> set. So while I won't claim that it's the case for any possible
>> integer config, it seems to be pretty consistent in practice.
>>
>> Also, I see a lot of values set to explicit 0, like:
>>
>> CONFIG_BASE_SMALL=0
>>
>> So it seems like integers are typically spelled out explicitly in real
>> configs and I think this 0 default is pretty sane.
>>
>> Next, speaking about extensibility. Once we have BTF type info for
>> externs, our possibilities are much better. It will be possible to
>> support bool, int, in64 for the same bool value. Libbpf will be able
>> to validate the range and fail program load if declared extern type
>> doesn't match actual value type and value range. So I think
>> extensibility is there, but right now we are enforcing (logically)
>> everything to be uin64_t. Unfortunately, with the way externs are done
>> in ELF, I don't know neither type nor size, so can't be more strict
>> than that.
>>
>> If we really need to know whether some config value is defined or not,
>> regardless of its value, we can have it by convention. E.g.,
>> CONFIG_DEFINED_XXX will be either 0 or 1, depending if corresponding
>> CONFIG_XXX is defined explicitly or not. But I don't want to add that
>> until we really have a use case where it matters.
>>
>>>
>>>> Generally speaking, libbpf is not aware of which CONFIG_XXX values is of which
>>>> expected type (bool, tristate, int), so it doesn't enforce any specific set of
>>>> values and just parses n/y/m as 0/1/2, respectively. CONFIG_XXX values not
>>>> found in config file are set to 0.
>>>
>>> This is not pretty either.
>>
>> What exactly: defaulting to zero or not knowing config value's type?
>> Given all the options, defaulting to zero seems like the best way to
>> go.
>>
>>>
>>>> +
>>>> +             switch (*value) {
>>>> +             case 'n':
>>>> +                     *ext_val = 0;
>>>> +                     break;
>>>> +             case 'y':
>>>> +                     *ext_val = 1;
>>>> +                     break;
>>>> +             case 'm':
>>>> +                     *ext_val = 2;
>>>> +                     break;
>>
>> reading some more code from scripts/kconfig/symbol.c, I'll need to
>> handle N/Y/M and 0x hexadecimals, will add in v2 after collecting some
>> more feedback on this version.
>>
>>>> +             case '"':
>>>> +                     pr_warn("extern '%s': strings are not supported\n",
>>>> +                             ext->name);
>>>> +                     err = -EINVAL;
>>>> +                     goto out;
>>>> +             default:
>>>> +                     errno = 0;
>>>> +                     *ext_val = strtoull(value, &value_end, 10);
>>>> +                     if (errno) {
>>>> +                             err = -errno;
>>>> +                             pr_warn("extern '%s': failed to parse value: %d\n",
>>>> +                                     ext->name, err);
>>>> +                             goto out;
>>>> +                     }
>>>
>>> BPF has bpf_strtol() helper. I think it would be cleaner to pass whatever
>>> .config has as bytes to the program and let program parse n/y/m, strings and
>>> integers.
>>
>> Config value is not changing. This is an incredible waste of CPU
>> resources to re-parse same value over and over again. And it's
>> incredibly much worse usability as well. Again, once we have BTF for
>> externs, we can just declare values as const char[] and then user will
>> be able to do its own parsing. Until then, I think pre-parsing values
>> into convenient u64 types are much better and handles all the typical
>> cases.
> 
> 
> One more thing I didn't realize I didn't state explicitly, because
> I've been thinking and talking about that for so long now, that it
> kind of internalized completely.
> 
> These externs, including CONFIG_XXX ones, are meant to interoperate
> nicely with field relocations within BPF CO-RE concept. They are,
> among other things, are meant to disable parts of BPF program logic
> through verifier's dead code elimination by doing something like:
> 
> 
> if (CONFIG_SOME_FEATURES_ENABLED) {
>      BPF_CORE_READ(t, some_extra_field);
>      /* or */
>      bpf_helper_that_only_present_when_feature_is_enabled();
> } else {
>      /* fallback logic */
> }
> 
> With CONFIG_SOME_FEATURES_ENABLED not being a read-only integer
> constant when BPF program is loaded, this is impossible. So it
> absolutely must be some sort of easy to use integer constant. 

Hmm. what difference do you see between u64 and char[] ?
The const propagation logic in the verifier should work the same way.
If it doesn't it's a bug in the verifier and it's not ok to hack
extern api to workaround the bug.

What you're advocating with libbpf-side of conversion to integers
reminds me of our earlier attempts with cgroup_sysctl hooks where
we started with ints only to realize that in practice it's too
limited. Then bpf_strtol was introduced and api got much cleaner.
Same thing here. Converting char[] into ints or whatever else
is the job of the program. Not of libbpf. The verifier can be taught
to optimize bpf_strtol() into const when const char[] is passed in.

As far as is_enabled() check doing it as 0/1 the way you're proposing
has in-band signaling issues that you admitted in the commit log.
For is_enabled() may be new builtin() on llvm side would be better?
Something like __builtin_preserve_field_info(field, BPF_FIELD_EXISTS)
but can be used on _any_ extern function or variable.
Like __builtin_is_extern_resolved(extern_name);
Then on libbpf side CONFIG_* that are not in config.gz won't be seen
by the program (instead of seen as 0 in your proposal) and the code
will look like:
if (__builtin_is_extern_resolved(CONFIG_NETNS)) {
   ..do things;
} else {
}
The verifier dead code elimination will take care of branches.
The BPF program itself doesn't need to read the value of CONFIG_
it only needs to know whether it was defined.
Such builtin would match semantics better.
If CONFIG_ is tri-state doing
if (*(u8*)CONFIG_FOO == 'y' || *(u8*)CONFIG_FOO == 'm')
is cleaner than *(u64*)CONFIG_FOO == 1 || 2.
and constant propagation in the verifier should work the same way.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux