Re: [RFC bpf-next] Speedup for verifier.c:do_misc_fixups

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 24 Jun 2022 11:57:15 -0700

On Fri, Jun 24, 2022 at 11:39 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Thu, Jun 23, 2022 at 05:02:34PM +0300, Eduard Zingerman wrote:
> > > On Wed, 2022-06-22 at 20:11 -0700, Alexei Starovoitov wrote:
> > [...]
> > > tbh sounds scary. We had so many bugs in the patch_insn over years.
> > > The code is subtle.
> >
> > In terms of testing strategy the following could be done:
> > - use pseudo-random testing to verify that `bpf_patch_insn_data` and
> >   the mechanism suggested in my email produce identical code.
> > - use pseudo-random testing to verify that offsets after rewrite point
> >   to expected instructions (e.g. use .imm field as a unique marker for
> >   the instruction).
> > - hand-written tests for corner cases.
> >
> > Would you consider this to be sufficient?  (Probably does not sound
> > too reassuring from the person[me] who submits patches with trivial
> > use after free errors :)
> >
> > [...]
> >
> > > Before proceeding too far based on artificial test please collect
> > > the number of patches and their sizes we currently do across all progs
> > > in selftests. Maybe it's not that bad in real code.
> >
> > I will collect and share the stats on Saturday or Sunday.
> >
> > > As far as algo the patch_and_copy_insn() assumes that 'dst' insn is a branch?
> > > I guess I'm missing some parts of the proposed algo.
> >
> > Sorry, I made a mistake in the original email, the code for
> > patch_and_copy_insn() should look as follows:
> >
> > static void patch_and_copy_insn(struct bpf_patching_state *state, int pc,
> >                               struct bpf_insn *dst, struct bpf_insn *src) {
> >       memcpy(dst, src, sizeof(struct bpf_insn));
> >       // TODO: other classes
> >       // TODO: also adjust imm for calls
> >       if (BPF_CLASS(src->code) == BPF_JMP) {
> >               int new_pc = pc + lookup_delta_for_index(state, pc);
> >               int dst_pc = pc + src->off + 1;
> >               int new_dst = dst_pc + lookup_delta_for_index(state, dst_pc);
> >               dst->off = new_dst - new_pc - 1;
> >       }
> > }
> >
> >
> > The logic is as follows:
> > - compute new instruction index for the old pc
> > - compute new instruction index for the (old pc + offset)
> > - use these values to compute the new offset
> >
> > The draft implementation of this algorithm is at the end of this
> > email.
> >
> > > Instead of delaying everything maybe we can do all patching inline
> > > except bpf_adj_branches?
> > > Remember only:
> > >    int offset;             /* index inside the original program,
> > >                             * the instruction at this index would be replaced.
> > >                             */
> > >    int insns_count;        /* size of the patch */
> > >    int delta;              /* difference in indexes between original program and
> > >                             * new program after application of all patches up to
> > >                             * and including this one.
> > >                             */
> > > and do single bpf_adj_branches at the end ?
> >
> > The algorithm consists of two parts:
> > - To avoid excessive copying patches are accumulated in a separate
> >   array and size of this array is doubled each time the capacity is
> >   not sufficient to fit a new patch. This should have O(log(n))
> >   complexity in terms of copied bytes.
> > - To avoid excessive branch adjustment passes a single branch
> >   adjustment pass is performed at the end. This pass visits each
> >   instruction only once, however, for each instruction it will have
> >   to query the delta value in a sorted array. Thus the overall
> >   complexity of this pass would be O(n*log(n)). It is possible to
> >   adjust some relevant fields in `env` during this pass as well.
>
> Before jumping into coding let's explore the alternatives.
> Can we use some of the compiler techniques?
> Like:
> - split a set of insn into basic blocks (BB)
> - convert each jmp from relative offset to fixed bb index
> - perform patching of insns. No jmp adjustments are necessary.
>   Every patch applies within bb. Minimal realloc+copy of insns within bb.
> - reconstruct a set of insn from a set of bb-s.
>
> John, Yonghong, everyone,
> may be other ideas?

We've had similar discussion ages (~3 years) ago. I had much more
context at that time, but I'll link to my latest proposal ([0]) within
that thread.

Roughly the idea was along the lines of what Alexei proposed above,
but instead of a basic block we kept all instructions as individual
linked list nodes. So we'd need something like 2x of memory relative
to the current array of struct bpf_insns, but that seems totally
acceptable. And then instead of updating relative indices, we'd keep a
mapping from the original instruction index to the latest real
instruction index. All this is pretty hazy for me at this point, but I
thought I'd link to that old discussion for completeness.

  [0] https://lore.kernel.org/bpf/CAEf4BzYDAVUgajz4=dRTu5xQDddp5pi2s=T1BdFmRLZjOwGypQ@xxxxxxxxxxxxxx/