Re: [PATCH v1 bpf-next 3/9] bpf: expose how xlated insns map to jitted insns

Anton Protopopov <aspsk@xxxxxxxxxxxxx> · Fri, 16 Feb 2024 13:57:53 +0000

On Wed, Feb 14, 2024 at 10:48:26PM -0800, Alexei Starovoitov wrote:
> On Thu, Feb 8, 2024 at 3:11 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> >
> > On Tue, Feb 06, 2024 at 06:26:12PM -0800, Alexei Starovoitov wrote:
> > > On Tue, Feb 6, 2024 at 2:08 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Feb 05, 2024 at 05:09:51PM -0800, Alexei Starovoitov wrote:
> > > > > On Fri, Feb 2, 2024 at 8:34 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > index 4def3dde35f6..bdd6be718e82 100644
> > > > > > --- a/include/linux/bpf.h
> > > > > > +++ b/include/linux/bpf.h
> > > > > > @@ -1524,6 +1524,13 @@ struct bpf_prog_aux {
> > > > > >         };
> > > > > >         /* an array of original indexes for all xlated instructions */
> > > > > >         u32 *orig_idx;
> > > > > > +       /* for every xlated instruction point to all generated jited
> > > > > > +        * instructions, if allocated
> > > > > > +        */
> > > > > > +       struct {
> > > > > > +               u32 off;        /* local offset in the jitted code */
> > > > > > +               u32 len;        /* the total len of generated jit code */
> > > > > > +       } *xlated_to_jit;
> > > > >
> > > > > Simply put Nack to this approach.
> > > > >
> > > > > Patches 2 and 3 add an extreme amount of memory overhead.
> > > > >
> > > > > As we discussed during office hours we need a "pointer to insn" concept
> > > > > aka "index on insn".
> > > > > The verifier would need to track that such things exist and adjust
> > > > > indices of insns when patching affects those indices.
> > > > >
> > > > > For every static branch there will be one such "pointer to insn".
> > > > > Different algorithms can be used to keep them correct.
> > > > > The simplest 'lets iterate over all such pointers and update them'
> > > > > during patch_insn() may even be ok to start.
> > > > >
> > > > > Such "pointer to insn" won't add any memory overhead.
> > > > > When patch+jit is done all such "pointer to insn" are fixed value.
> > > >
> > > > Ok, thanks for looking, this makes sense.
> > >
> > > Before jumping into coding I think it would be good to discuss
> > > the design first.
> > > I'm thinking such "address of insn" will be similar to
> > > existing "address of subprog",
> > > which is encoded in ld_imm64 as BPF_PSEUDO_FUNC.
> > > "address of insn" would be a bit more involved to track
> > > during JIT and likely trivial during insn patching,
> > > since we're already doing imm adjustment for pseudo_func.
> > > So that part of design is straightforward.
> > > Implementation in the kernel and libbpf can copy paste from pseudo_func too.
> >
> > To implement the "primitive version" of static branches, where the
> > only API is `static_branch_update(xlated off, on/off)` the only
> > requirement is to build `xlated -> jitted` mapping (which is done
> > in JIT, after the verification). This can be done in a simplified
> > version of this patch, without xlated->orig mapping and with
> > xlated->jit mapping only done to gotol_or_nop instructions.
> 
> yes. The array of insn->jit_addr sized with as many goto_or_nop-s
> the prog will work for user space to flip them, but...
> 
> > The "address of insn" appears when we want to provide a more
> > higher-level API when some object (in user-space or in kernel) keeps
> > track of one or more gotol_or_nop instructions so that after the
> > program load this controlling object has a list of xlated offsets.
> > But this would be a follow-up to the initial static branches patch.
> 
> this won't work as a follow up,
> since such an array won't work for bpf prog that wants to flip branches.
> There is nothing that associates static_branch name/id with
> particular goto_or_nop.
> There could be a kfunc that bpf prog calls, but it can only
> flip all of such insns in the prog.
> Unless we start encoding a special id inside goto_or_nop or other hacks.
> 
> > > The question is whether such "address of insn" should be allowed
> > > in the data section. If so, we need to brainstorm how to
> > > do it cleanly.
> > > We had various hacks for similar things in the past. Like prog_array.
> > > Let's not repeat such mistakes.
> >
> > So, data section is required for implementing jump tables? Like,
> > to add a new PTR_TO_LABEL or PTR_TO_INSN data type, and a
> > corresponding "ptr to insn" object for every occurence of &&label,
> > which will be adjusted during verification.
> > Looks to me like this one doesn't require any more API than specifying
> > a list of &&label occurencies on program load.
> >
> > For "static keys" though (a feature on top of this patch series) we
> > need to have access to the corresponding set of adjusted pointers.
> >
> > Isn't this enough to add something like an array of
> >
> >   struct insn_ptr {
> >       u32 type; /* LABEL, STATIC_BRANCH,... */
> >       u32 insn_off; /* original offset on load */
> >       union {
> >           struct label {...};
> >           struct st_branch { u32 key_id, ..};
> >       };
> >   };
> 
> which I don't like because it hard codes static_branch needs into
> insn->jit_addr association.
> "address of insn" should be an individual building block without
> bolted on parts.
> 
> A data section with a set of such "address of insn"
> can be a description of one static_branch.
> There will be different ways to combine such building blocks.
> For example:
> static_branch(foo) can emit goto_or_nop into bpf code
> and add "address of insn" into a section '.insn_addrs.foo".
> This section is what libbpf and bpf prog will recognize as a set
> of "address of insn" that can be passed into static_branch_update kfunc
> or static_branch_update sys_bpf command.
> The question is whether we need a new map type (array derivative)
> to hold a set of "address of insn" or it can be a part of an existing
> global data array.
> A new map type is easier to reason about.
> Notice how such a new map type is not a map type of static branches.
> It's not a map type of goto_or_nop instructions either.
> 
> At load time libbpf can populate this array with indices of insns
> that the verifier and JIT need to track. Once JITed the array is readonly
> for bpf prog and for user space.

So this will be a map per .insn_addrs.X section (where X is key or
a pre-defined suffix for jump tables or indirect calls). And to tell
the verifier about these maps we will need to pass an array of

    struct {
            u32 map_fd;
            u32 type; /* static key, jump table, etc. */
    }

on program load. Is this correct?

> With that mechanism compilers can generate a proper switch() jmp table.
> llvm work can be a follow up, of course, but the whole design needs
> to be thought through to cover all use cases.
> 
> To summarize, here's what I'm proposing:
> - PTR_TO_INSN verifier regtype that can be passed to static_branch_update kfunc

If we have a set of pointers to jump instructions, generated from
static_branch(foo) for same foo, then this makes more sense to
provide a

    static_branch_update(foo)

(where foo is substituted by libbpf with a map fd of .insn_addrs.foo
on load). The same for userspace:

    bpf(STATIC_BRANCH_UPDATE, .attrs={.map_fd=foo})

> - new map type (array) that holds objects that are PTR_TO_INSN for the verifier
> libbpf populates this array with indices of insn it wants to track.
> bpf prog needs to "use" this array, so prog/map association is built.
> - verifier/JIT update each PTR_TO_INSN during transformations.
> - static_branch(foo) macro emits goto_or_nop insn and adds 8 bytes
> into ".insn_addrs.foo" section with an ELF relocation that
> libbpf will convert into index.
> 
> When compilers implement jmptables for switch(key) they will generate
> ".insn_addrs.uniq_suffix" sections and emit
> rX = ld_imm64 that_section
> rX += switch_key
> rY = *(u64 *)rX
> jmpx rY

What are the types for rX and rY? I thought that we will need to do
smth like

  rX = .insn_addrs.uniq_suffix[switch_key] /* rX has type PTR_TO_INSN */
  ...
  jmpx rX

this can be done if for switch cases (or any other goto *label alike) we generate

  rX = map_lookup_elem(.insn_addrs.uniq_suffix, index)
  jmpx rX

> The verifier would need to do push_stack() for this indirect jmp insn
> as many times as there are elements in ".insn_addrs.uniq_suffix" array.
> 
> And similar for indirect calls.
> That section becomes an array of pointers to functions.
> We can make it more flexible for indirect callx by
> storing BTF func proto and allowing global subprogs with same proto
> to match as safe call targets.