Re: [PATCH v1 bpf-next 3/9] bpf: expose how xlated insns map to jitted insns

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 14 Mar 2024 13:06:25 -0700

On Thu, Mar 14, 2024 at 10:07 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Thu, Mar 14, 2024 at 2:11 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Mar 13, 2024 at 06:56:34PM -0700, Alexei Starovoitov wrote:
> > > On Wed, Mar 6, 2024 at 2:51 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Feb 20, 2024 at 05:09:36PM -0800, Alexei Starovoitov wrote:
> > > > > On Fri, Feb 16, 2024 at 6:04 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Wed, Feb 14, 2024 at 10:48:26PM -0800, Alexei Starovoitov wrote:
> > > > > > > On Thu, Feb 8, 2024 at 3:11 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > On Tue, Feb 06, 2024 at 06:26:12PM -0800, Alexei Starovoitov wrote:
> > > > > > > > > On Tue, Feb 6, 2024 at 2:08 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Feb 05, 2024 at 05:09:51PM -0800, Alexei Starovoitov wrote:
> > > > > > > > > > > On Fri, Feb 2, 2024 at 8:34 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > > > > > > > index 4def3dde35f6..bdd6be718e82 100644
> > > > > > > > > > > > --- a/include/linux/bpf.h
> > > > > > > > > > > > +++ b/include/linux/bpf.h
> > > > > > > > > > > > @@ -1524,6 +1524,13 @@ struct bpf_prog_aux {
> > > > > > > > > > > >         };
> > > > > > > > > > > >         /* an array of original indexes for all xlated instructions */
> > > > > > > > > > > >         u32 *orig_idx;
> > > > > > > > > > > > +       /* for every xlated instruction point to all generated jited
> > > > > > > > > > > > +        * instructions, if allocated
> > > > > > > > > > > > +        */
> > > > > > > > > > > > +       struct {
> > > > > > > > > > > > +               u32 off;        /* local offset in the jitted code */
> > > > > > > > > > > > +               u32 len;        /* the total len of generated jit code */
> > > > > > > > > > > > +       } *xlated_to_jit;
> > > > > > > > > > >
> > > > > > > > > > > Simply put Nack to this approach.
> > > > > > > > > > >
> > > > > > > > > > > Patches 2 and 3 add an extreme amount of memory overhead.
> > > > > > > > > > >
> > > > > > > > > > > As we discussed during office hours we need a "pointer to insn" concept
> > > > > > > > > > > aka "index on insn".
> > > > > > > > > > > The verifier would need to track that such things exist and adjust
> > > > > > > > > > > indices of insns when patching affects those indices.
> > > > > > > > > > >
> > > > > > > > > > > For every static branch there will be one such "pointer to insn".
> > > > > > > > > > > Different algorithms can be used to keep them correct.
> > > > > > > > > > > The simplest 'lets iterate over all such pointers and update them'
> > > > > > > > > > > during patch_insn() may even be ok to start.
> > > > > > > > > > >
> > > > > > > > > > > Such "pointer to insn" won't add any memory overhead.
> > > > > > > > > > > When patch+jit is done all such "pointer to insn" are fixed value.
> > > > > > > > > >
> > > > > > > > > > Ok, thanks for looking, this makes sense.
> > > > > > > > >
> > > > > > > > > Before jumping into coding I think it would be good to discuss
> > > > > > > > > the design first.
> > > > > > > > > I'm thinking such "address of insn" will be similar to
> > > > > > > > > existing "address of subprog",
> > > > > > > > > which is encoded in ld_imm64 as BPF_PSEUDO_FUNC.
> > > > > > > > > "address of insn" would be a bit more involved to track
> > > > > > > > > during JIT and likely trivial during insn patching,
> > > > > > > > > since we're already doing imm adjustment for pseudo_func.
> > > > > > > > > So that part of design is straightforward.
> > > > > > > > > Implementation in the kernel and libbpf can copy paste from pseudo_func too.
> > > > > > > >
> > > > > > > > To implement the "primitive version" of static branches, where the
> > > > > > > > only API is `static_branch_update(xlated off, on/off)` the only
> > > > > > > > requirement is to build `xlated -> jitted` mapping (which is done
> > > > > > > > in JIT, after the verification). This can be done in a simplified
> > > > > > > > version of this patch, without xlated->orig mapping and with
> > > > > > > > xlated->jit mapping only done to gotol_or_nop instructions.
> > > > > > >
> > > > > > > yes. The array of insn->jit_addr sized with as many goto_or_nop-s
> > > > > > > the prog will work for user space to flip them, but...
> > > > > > >
> > > > > > > > The "address of insn" appears when we want to provide a more
> > > > > > > > higher-level API when some object (in user-space or in kernel) keeps
> > > > > > > > track of one or more gotol_or_nop instructions so that after the
> > > > > > > > program load this controlling object has a list of xlated offsets.
> > > > > > > > But this would be a follow-up to the initial static branches patch.
> > > > > > >
> > > > > > > this won't work as a follow up,
> > > > > > > since such an array won't work for bpf prog that wants to flip branches.
> > > > > > > There is nothing that associates static_branch name/id with
> > > > > > > particular goto_or_nop.
> > > > > > > There could be a kfunc that bpf prog calls, but it can only
> > > > > > > flip all of such insns in the prog.
> > > > > > > Unless we start encoding a special id inside goto_or_nop or other hacks.
> > > > > > >
> > > > > > > > > The question is whether such "address of insn" should be allowed
> > > > > > > > > in the data section. If so, we need to brainstorm how to
> > > > > > > > > do it cleanly.
> > > > > > > > > We had various hacks for similar things in the past. Like prog_array.
> > > > > > > > > Let's not repeat such mistakes.
> > > > > > > >
> > > > > > > > So, data section is required for implementing jump tables? Like,
> > > > > > > > to add a new PTR_TO_LABEL or PTR_TO_INSN data type, and a
> > > > > > > > corresponding "ptr to insn" object for every occurence of &&label,
> > > > > > > > which will be adjusted during verification.
> > > > > > > > Looks to me like this one doesn't require any more API than specifying
> > > > > > > > a list of &&label occurencies on program load.
> > > > > > > >
> > > > > > > > For "static keys" though (a feature on top of this patch series) we
> > > > > > > > need to have access to the corresponding set of adjusted pointers.
> > > > > > > >
> > > > > > > > Isn't this enough to add something like an array of
> > > > > > > >
> > > > > > > >   struct insn_ptr {
> > > > > > > >       u32 type; /* LABEL, STATIC_BRANCH,... */
> > > > > > > >       u32 insn_off; /* original offset on load */
> > > > > > > >       union {
> > > > > > > >           struct label {...};
> > > > > > > >           struct st_branch { u32 key_id, ..};
> > > > > > > >       };
> > > > > > > >   };
> > > > > > >
> > > > > > > which I don't like because it hard codes static_branch needs into
> > > > > > > insn->jit_addr association.
> > > > > > > "address of insn" should be an individual building block without
> > > > > > > bolted on parts.
> > > > > > >
> > > > > > > A data section with a set of such "address of insn"
> > > > > > > can be a description of one static_branch.
> > > > > > > There will be different ways to combine such building blocks.
> > > > > > > For example:
> > > > > > > static_branch(foo) can emit goto_or_nop into bpf code
> > > > > > > and add "address of insn" into a section '.insn_addrs.foo".
> > > > > > > This section is what libbpf and bpf prog will recognize as a set
> > > > > > > of "address of insn" that can be passed into static_branch_update kfunc
> > > > > > > or static_branch_update sys_bpf command.
> > > > > > > The question is whether we need a new map type (array derivative)
> > > > > > > to hold a set of "address of insn" or it can be a part of an existing
> > > > > > > global data array.
> > > > > > > A new map type is easier to reason about.
> > > > > > > Notice how such a new map type is not a map type of static branches.
> > > > > > > It's not a map type of goto_or_nop instructions either.
> > > > > > >
> > > > > > > At load time libbpf can populate this array with indices of insns
> > > > > > > that the verifier and JIT need to track. Once JITed the array is readonly
> > > > > > > for bpf prog and for user space.
> > > > > >
> > > > > > So this will be a map per .insn_addrs.X section (where X is key or
> > > > > > a pre-defined suffix for jump tables or indirect calls). And to tell
> > > > > > the verifier about these maps we will need to pass an array of
> > > > > >
> > > > > >     struct {
> > > > > >             u32 map_fd;
> > > > > >             u32 type; /* static key, jump table, etc. */
> > > > > >     }
> > > > > >
> > > > > > on program load. Is this correct?
> > > > >
> > > > >
> > > > > Probably not.
> > > > > Since we're going with a new map type (at least for the sake of this
> > > > > discussion) it shouldn't need a new way to tell the verifier about it.
> > > > > If .insn_addrs.jmp_table_A was a section generated for switch() statement
> > > > > by llvm it will be created as a map by libbpf,
> > > > > and there will be an ld_imm64 insn generated by llvm that points
> > > > > to that map.
> > > > > libbpf will populate ld_imm64 insn with map_fd, just like it does
> > > > > for global data.
> > > >
> > > > I understand how this works for indirect jumps (and for the
> > > > bpf_static_branch_update(&foo) kfunc) where we have a ld_imm64 with a
> > > > map, however, I am still not sure how this will work for static
> > > > branches where we just have a 8 byte JA insn + an index in the
> > > > corresponding ".insn_addrs.foo" section. How kernel will know that the
> > > > program is using a corresponding map which we create from
> > > > ".insn_addrs.foo" without specifying this on program load?
> > > >
> > > > (Sorry for replying late, catching up after [simultaneous] pto &
> > > > covid.)
> > >
> > > sorry. ctx switch takes time.
> > > libbpf can just bpf_prog_bind_map to associate this new map type
> > > of ptr_to_insn with a program.
> > > Or I misunderstood the question?
> >
> > All ptr_to_insn maps are required during the verification. So
> > bpf_prog_bind_map can't be used as it requires an existing program.
>
> I see.
>
> > What could work and what I am proposing is to pass a list of bound
> > maps in prog_load attributes. Then such maps can be used during the
> > verification. For normal maps
> >
> >   prog = prog_load(attr={.bound_maps=maps})
> >
> > will be semantically the same as
> >
> >   prog = prog_load()
> >   bpf_prog_bind_map(prog, maps)
>
> Instead of a whole new api, let's make libbpf insert
> ld_imm64 r0, map
> as the first insn for this case for now.

This sounds like a big hack and unnecessary complication, tbh. I'd
like to avoid having to do this in libbpf.

But I think we almost have this already supported. In BPF_PROG_LOAD
UAPI we have fd_array property, right? I think right now we lazily
refcnt referenced maps. But I think it makes total sense to just
define that bpf_prog will keep references to all BPF objects passed in
through fd_array, WDYT? Verifier will just iterate all provided FDs,
determine kind of BPF object it is (and reject unknown ones), and then
just take refcnts on each of them once. On prog free we'll just do the
same in reverse and be done with it.

It also can be used as a batch and single-step (in the sense it will
be done as part of program load instead of a separate command)
alternative for bpf_prog_bind_map(), I suppose.

Thoughts?