Re: [PATCH bpf-next v2 4/5] bpftool: add support for subskeletons

Delyan Kratunov <delyank@xxxxxx> · Mon, 14 Mar 2022 23:18:32 +0000

Thanks, Andrii! The nits/refactors are straightforward but have some questions
below:

On Fri, 2022-03-11 at 15:27 -0800, Andrii Nakryiko wrote:
> > +
> > +                       /* sanitize variable name, e.g., for static vars inside
> > +                        * a function, it's name is '<function name>.<variable name>',
> > +                        * which we'll turn into a '<function name>_<variable name>'.
> > +                        */
> > +                       sanitize_identifier(var_ident + 1);
> 
> btw, I think we don't need sanitization anymore. We needed it for
> static variables (they would be of the form <func_name>.<var_name> for
> static variables inside the functions), but now it's just unnecessary
> complication

How would we handle static variables inside functions in libraries then?

> 
> > +                       var_ident[0] = ' ';
> > +
> > +                       /* The datasec member has KIND_VAR but we want the
> > +                        * underlying type of the variable (e.g. KIND_INT).
> > +                        */
> > +                       var = btf__type_by_id(btf, var->type);
> 
> you need to use skip_mods_and_typedefs() or equivalent to skip any
> const/volatile/restrict modifiers before checking btf_is_array()

Good catch!

> 
> > +
> > +                       /* Prepend * to the field name to make it a pointer. */
> > +                       var_ident[0] = '*';
> > +
> > +                       printf("\t\t");
> > +                       /* Func and array members require special handling.
> > +                        * Instead of producing `typename *var`, they produce
> > +                        * `typeof(typename) *var`. This allows us to keep a
> > +                        * similar syntax where the identifier is just prefixed
> > +                        * by *, allowing us to ignore C declaration minutae.
> > +                        */
> > +                       if (btf_is_array(var) ||
> > +                           btf_is_ptr_to_func_proto(btf, var)) {
> > +                               printf("typeof(");
> > +                               /* print the type inside typeof() without a name */
> > +                               opts.field_name = "";
> > +                               err = btf_dump__emit_type_decl(d, var_type_id, &opts);
> > +                               if (err)
> > +                                       goto out;
> > +                               printf(") %s", var_ident);
> > +                       } else {
> > +                               err = btf_dump__emit_type_decl(d, var_type_id, &opts);
> > +                               if (err)
> > +                                       goto out;
> > +                       }
> > +                       printf(";\n");
> 
> we can simplify this a bit around var_ident and two
> btf_dump__emit_type_decl() invocations. We know that we are handling
> "non-uniform" C syntax for array and func pointer, so we don't need to
> use opts.field_name. Doing this (schematically) should work (taking
> into account no need for sanitization as well):
> 
> if (btf_is_array() || btf_is_ptr_to_func_proto())
>     printf("typeof(");
> btf_dump__emit_type_decl(... /* opts.field_name stays NULL */);
> printf(" *%s", var_name);
> 
> or did I miss some corner case?

You didn't close the "typeof" :) 

if (btf_is_array() || btf_is_ptr_to_func_proto())
     printf("typeof(");
btf_dump__emit_type_decl(... /* opts.field_name stays NULL */);
if (btf_is_array() || btf_is_ptr_to_func_proto())
     printf(")");
printf(" *%s", var_name);

If you feel that's easier to understand, sure. I don't love it but it's
understandable enough.

[...]

> we don't know the name of the final object, why would we allow to set
> any object name at all?

We don't really care about the final object name but we do need an object name
for the subskeleton. The subskeleton type name, header guard etc all use it.
We can say that it's always taken from the file name, but giving the user the
option to override it feels right, given the parallel with skeletons (and what
would we do if the file name is a pipe from a subshell invocation?).

> > 
> > +
> > +       /* The empty object name allows us to use bpf_map__name and produce
> > +        * ELF section names out of it. (".data" instead of "obj.data")
> > +        */
> > +       opts.object_name = "";
> 
> yep, like this. So that "name" argument "support" above is bogus,
> let's remove it

See above, it changes real things.

> 
> > +       obj = bpf_object__open_mem(obj_data, file_sz, &opts);
> > +       if (!obj) {
> > +               char err_buf[256];
> > +
> > +               libbpf_strerror(errno, err_buf, sizeof(err_buf));
> > +               p_err("failed to open BPF object file: %s", err_buf);
> > +               obj = NULL;
> > +               goto out;
> > +       }
> > +
> 
> [...]
> 
> > +               for (i = 0; i < len; i++, var++) {
> > +                       var_type = btf__type_by_id(btf, var->type);
> > +                       var_name = btf__name_by_offset(btf, var_type->name_off);
> > +
> > +                       if (btf_var(var_type)->linkage == BTF_VAR_STATIC)
> > +                               continue;
> > +
> > +                       var_ident[0] = '\0';
> > +                       strncat(var_ident, var_name, sizeof(var_ident) - 1);
> > +                       sanitize_identifier(var_ident);
> > +
> > +                       /* Note that we use the dot prefix in .data as the
> > +                        * field access operator i.e. maps%s becomes maps.data
> > +                        */
> > +                       codegen("\
> > +                       \n\
> > +                               s->vars[%4$d].name = \"%1$s\";              \n\
> > +                               s->vars[%4$d].map = &obj->maps%3$s;         \n\
> > +                               s->vars[%4$d].addr = (void**) &obj->%2$s.%1$s;\n\
> > +                       ", var_ident, ident, bpf_map__name(map), var_idx);
> 
> map reference should be using ident, not bpf_map__name(), as it refers
> to a field. The way it is now it shouldn't work for custom
> .data.my_section case (do you have a test for this?) You shouldn't
> need bpf_map__name() here at all.

Good catch, I'll add a .data.custom test.

[...]

> 
> > +               "       %1$s %2$s subskeleton FILE [name OBJECT_NAME]\n"
> 
> [name OBJECT_NAME] should be supported

Not sure what you mean by "supported" here.

> 
> >                 "       %1$s %2$s min_core_btf INPUT OUTPUT OBJECT [OBJECT...]\n"
> >                 "       %1$s %2$s help\n"
> >                 "\n"
> > @@ -1788,6 +2250,7 @@ static int do_min_core_btf(int argc, char **argv)
> >  static const struct cmd cmds[] = {
> >         { "object",             do_object },
> >         { "skeleton",           do_skeleton },
> > +       { "subskeleton",        do_subskeleton },
> >         { "min_core_btf",       do_min_core_btf},
> >         { "help",               do_help },
> >         { 0 }
> > --
> > 2.34.1

-- Delyan