Re: [PATCH v1 bpf-next 3/9] bpf: expose how xlated insns map to jitted insns

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 14 Feb 2024 22:48:26 -0800

On Thu, Feb 8, 2024 at 3:11 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
>
> On Tue, Feb 06, 2024 at 06:26:12PM -0800, Alexei Starovoitov wrote:
> > On Tue, Feb 6, 2024 at 2:08 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Feb 05, 2024 at 05:09:51PM -0800, Alexei Starovoitov wrote:
> > > > On Fri, Feb 2, 2024 at 8:34 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > index 4def3dde35f6..bdd6be718e82 100644
> > > > > --- a/include/linux/bpf.h
> > > > > +++ b/include/linux/bpf.h
> > > > > @@ -1524,6 +1524,13 @@ struct bpf_prog_aux {
> > > > >         };
> > > > >         /* an array of original indexes for all xlated instructions */
> > > > >         u32 *orig_idx;
> > > > > +       /* for every xlated instruction point to all generated jited
> > > > > +        * instructions, if allocated
> > > > > +        */
> > > > > +       struct {
> > > > > +               u32 off;        /* local offset in the jitted code */
> > > > > +               u32 len;        /* the total len of generated jit code */
> > > > > +       } *xlated_to_jit;
> > > >
> > > > Simply put Nack to this approach.
> > > >
> > > > Patches 2 and 3 add an extreme amount of memory overhead.
> > > >
> > > > As we discussed during office hours we need a "pointer to insn" concept
> > > > aka "index on insn".
> > > > The verifier would need to track that such things exist and adjust
> > > > indices of insns when patching affects those indices.
> > > >
> > > > For every static branch there will be one such "pointer to insn".
> > > > Different algorithms can be used to keep them correct.
> > > > The simplest 'lets iterate over all such pointers and update them'
> > > > during patch_insn() may even be ok to start.
> > > >
> > > > Such "pointer to insn" won't add any memory overhead.
> > > > When patch+jit is done all such "pointer to insn" are fixed value.
> > >
> > > Ok, thanks for looking, this makes sense.
> >
> > Before jumping into coding I think it would be good to discuss
> > the design first.
> > I'm thinking such "address of insn" will be similar to
> > existing "address of subprog",
> > which is encoded in ld_imm64 as BPF_PSEUDO_FUNC.
> > "address of insn" would be a bit more involved to track
> > during JIT and likely trivial during insn patching,
> > since we're already doing imm adjustment for pseudo_func.
> > So that part of design is straightforward.
> > Implementation in the kernel and libbpf can copy paste from pseudo_func too.
>
> To implement the "primitive version" of static branches, where the
> only API is `static_branch_update(xlated off, on/off)` the only
> requirement is to build `xlated -> jitted` mapping (which is done
> in JIT, after the verification). This can be done in a simplified
> version of this patch, without xlated->orig mapping and with
> xlated->jit mapping only done to gotol_or_nop instructions.

yes. The array of insn->jit_addr sized with as many goto_or_nop-s
the prog will work for user space to flip them, but...

> The "address of insn" appears when we want to provide a more
> higher-level API when some object (in user-space or in kernel) keeps
> track of one or more gotol_or_nop instructions so that after the
> program load this controlling object has a list of xlated offsets.
> But this would be a follow-up to the initial static branches patch.

this won't work as a follow up,
since such an array won't work for bpf prog that wants to flip branches.
There is nothing that associates static_branch name/id with
particular goto_or_nop.
There could be a kfunc that bpf prog calls, but it can only
flip all of such insns in the prog.
Unless we start encoding a special id inside goto_or_nop or other hacks.

> > The question is whether such "address of insn" should be allowed
> > in the data section. If so, we need to brainstorm how to
> > do it cleanly.
> > We had various hacks for similar things in the past. Like prog_array.
> > Let's not repeat such mistakes.
>
> So, data section is required for implementing jump tables? Like,
> to add a new PTR_TO_LABEL or PTR_TO_INSN data type, and a
> corresponding "ptr to insn" object for every occurence of &&label,
> which will be adjusted during verification.
> Looks to me like this one doesn't require any more API than specifying
> a list of &&label occurencies on program load.
>
> For "static keys" though (a feature on top of this patch series) we
> need to have access to the corresponding set of adjusted pointers.
>
> Isn't this enough to add something like an array of
>
>   struct insn_ptr {
>       u32 type; /* LABEL, STATIC_BRANCH,... */
>       u32 insn_off; /* original offset on load */
>       union {
>           struct label {...};
>           struct st_branch { u32 key_id, ..};
>       };
>   };

which I don't like because it hard codes static_branch needs into
insn->jit_addr association.
"address of insn" should be an individual building block without
bolted on parts.

A data section with a set of such "address of insn"
can be a description of one static_branch.
There will be different ways to combine such building blocks.
For example:
static_branch(foo) can emit goto_or_nop into bpf code
and add "address of insn" into a section '.insn_addrs.foo".
This section is what libbpf and bpf prog will recognize as a set
of "address of insn" that can be passed into static_branch_update kfunc
or static_branch_update sys_bpf command.
The question is whether we need a new map type (array derivative)
to hold a set of "address of insn" or it can be a part of an existing
global data array.
A new map type is easier to reason about.
Notice how such a new map type is not a map type of static branches.
It's not a map type of goto_or_nop instructions either.

At load time libbpf can populate this array with indices of insns
that the verifier and JIT need to track. Once JITed the array is readonly
for bpf prog and for user space.

With that mechanism compilers can generate a proper switch() jmp table.
llvm work can be a follow up, of course, but the whole design needs
to be thought through to cover all use cases.

To summarize, here's what I'm proposing:
- PTR_TO_INSN verifier regtype that can be passed to static_branch_update kfunc
- new map type (array) that holds objects that are PTR_TO_INSN for the verifier
libbpf populates this array with indices of insn it wants to track.
bpf prog needs to "use" this array, so prog/map association is built.
- verifier/JIT update each PTR_TO_INSN during transformations.
- static_branch(foo) macro emits goto_or_nop insn and adds 8 bytes
into ".insn_addrs.foo" section with an ELF relocation that
libbpf will convert into index.

When compilers implement jmptables for switch(key) they will generate
".insn_addrs.uniq_suffix" sections and emit
rX = ld_imm64 that_section
rX += switch_key
rY = *(u64 *)rX
jmpx rY

The verifier would need to do push_stack() for this indirect jmp insn
as many times as there are elements in ".insn_addrs.uniq_suffix" array.

And similar for indirect calls.
That section becomes an array of pointers to functions.
We can make it more flexible for indirect callx by
storing BTF func proto and allowing global subprogs with same proto
to match as safe call targets.