Back from the holidays; Sorry about the delayed reply. On Mon, 23 Dec 2019 at 19:03, Palmer Dabbelt <palmerdabbelt@xxxxxxxxxx> wrote: > > On Mon, 16 Dec 2019 01:13:36 PST (-0800), Bjorn Topel wrote: > > This commit adds branch relaxation to the BPF JIT, and with that [...] > > @@ -1557,6 +1569,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) > > { > > bool tmp_blinded = false, extra_pass = false; > > struct bpf_prog *tmp, *orig_prog = prog; > > + int pass = 0, prev_ninsns = 0, i; > > struct rv_jit_data *jit_data; > > struct rv_jit_context *ctx; > > unsigned int image_size; > > @@ -1596,15 +1609,25 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) > > prog = orig_prog; > > goto out_offset; > > } > > + for (i = 0; i < prog->len; i++) { > > + prev_ninsns += 32; > > + ctx->offset[i] = prev_ninsns; > > + } > > It feels like the first-order implementation is the same as binutils here: the > first round is worst cased, after which things can be more exact. We're only > doing one pass in binutils because most of the relaxation happens in the > linker, but this approach seems reasonable to me. I'd be interested in seeing > some benchmarks, as it may be worth relaxing these in the binutils linker as > well -- I can certainly come up with contrived test cases that aren't relaxed, > but I'm not sure how common this is. > Ah, interesting! Let me try to pull out some branch relaxation statistics/benchmarks for the BPF selftests. > My only worry is that that invariant should be more explicit. Specifically, > I'm thinking that every time offset is updated there should be some sort of > assertion that the offset is shrinking. This is enforced structurally in the > binutils code because we only generate code once and then move it around, but > since you're generating code every time it'd be easy for a bug to sneak in as > the JIT gets more complicated. > Hmm, yes. Maybe use a checksum for the program in addition to the length invariant, and converge condition would then be prev_len == len && prev_crc == crc? > Since most of the branches should be forward, you'll probably end up with way > fewer iterations if you do the optimization passes backwards. > Good idea! > > - /* First pass generates the ctx->offset, but does not emit an image. */ > > - if (build_body(ctx, extra_pass)) { > > - prog = orig_prog; > > - goto out_offset; > > + for (i = 0; i < 16; i++) { > > + pass++; > > + ctx->ninsns = 0; > > + if (build_body(ctx, extra_pass)) { > > + prog = orig_prog; > > + goto out_offset; > > Isn't this returning a broken program if build_body() errors out the first time > through? > Hmm, care to elaborate? I don't see how? > > + } > > + build_prologue(ctx); > > + ctx->epilogue_offset = ctx->ninsns; > > + build_epilogue(ctx); > > + if (ctx->ninsns == prev_ninsns) > > + break; > > + prev_ninsns = ctx->ninsns; > > IDK how important the performance of the JIT is, but you could probably get > away with skipping an iteration by keeping track of some simple metric that > determines if it would be possible to > ...to? Given that the programs are getting larger, performance of the JIT is important. So, any means the number of passes can be reduced is a good thing! Thanks for the review/suggestions! Björn