Re: [PATCH bpf-next v2 2/9] riscv, bpf: add support for far branching

Björn Töpel <bjorn.topel@xxxxxxxxx> · Tue, 7 Jan 2020 09:13:56 +0100

Back from the holidays; Sorry about the delayed reply.

On Mon, 23 Dec 2019 at 19:03, Palmer Dabbelt <palmerdabbelt@xxxxxxxxxx> wrote:
>
> On Mon, 16 Dec 2019 01:13:36 PST (-0800), Bjorn Topel wrote:
> > This commit adds branch relaxation to the BPF JIT, and with that
[...]
> > @@ -1557,6 +1569,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
> >  {
> >       bool tmp_blinded = false, extra_pass = false;
> >       struct bpf_prog *tmp, *orig_prog = prog;
> > +     int pass = 0, prev_ninsns = 0, i;
> >       struct rv_jit_data *jit_data;
> >       struct rv_jit_context *ctx;
> >       unsigned int image_size;
> > @@ -1596,15 +1609,25 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
> >               prog = orig_prog;
> >               goto out_offset;
> >       }
> > +     for (i = 0; i < prog->len; i++) {
> > +             prev_ninsns += 32;
> > +             ctx->offset[i] = prev_ninsns;
> > +     }
>
> It feels like the first-order implementation is the same as binutils here: the
> first round is worst cased, after which things can be more exact.  We're only
> doing one pass in binutils because most of the relaxation happens in the
> linker, but this approach seems reasonable to me.  I'd be interested in seeing
> some benchmarks, as it may be worth relaxing these in the binutils linker as
> well -- I can certainly come up with contrived test cases that aren't relaxed,
> but I'm not sure how common this is.
>

Ah, interesting! Let me try to pull out some branch relaxation
statistics/benchmarks for the BPF selftests.

> My only worry is that that invariant should be more explicit.  Specifically,
> I'm thinking that every time offset is updated there should be some sort of
> assertion that the offset is shrinking.  This is enforced structurally in the
> binutils code because we only generate code once and then move it around, but
> since you're generating code every time it'd be easy for a bug to sneak in as
> the JIT gets more complicated.
>

Hmm, yes. Maybe use a checksum for the program in addition to the
length invariant, and converge condition would then be prev_len == len
&& prev_crc == crc?

> Since most of the branches should be forward, you'll probably end up with way
> fewer iterations if you do the optimization passes backwards.
>

Good idea!

> > -     /* First pass generates the ctx->offset, but does not emit an image. */
> > -     if (build_body(ctx, extra_pass)) {
> > -             prog = orig_prog;
> > -             goto out_offset;
> > +     for (i = 0; i < 16; i++) {
> > +             pass++;
> > +             ctx->ninsns = 0;
> > +             if (build_body(ctx, extra_pass)) {
> > +                     prog = orig_prog;
> > +                     goto out_offset;
>
> Isn't this returning a broken program if build_body() errors out the first time
> through?
>

Hmm, care to elaborate? I don't see how?

> > +             }
> > +             build_prologue(ctx);
> > +             ctx->epilogue_offset = ctx->ninsns;
> > +             build_epilogue(ctx);
> > +             if (ctx->ninsns == prev_ninsns)
> > +                     break;
> > +             prev_ninsns = ctx->ninsns;
>
> IDK how important the performance of the JIT is, but you could probably get
> away with skipping an iteration by keeping track of some simple metric that
> determines if it would be possible to
>

...to? Given that the programs are getting larger, performance of the
JIT is important. So, any means the number of passes can be reduced is
a good thing!

Thanks for the review/suggestions!
Björn