On Thu, Jun 23, 2022 at 05:02:34PM +0300, Eduard Zingerman wrote: > > On Wed, 2022-06-22 at 20:11 -0700, Alexei Starovoitov wrote: > [...] > > tbh sounds scary. We had so many bugs in the patch_insn over years. > > The code is subtle. > > In terms of testing strategy the following could be done: > - use pseudo-random testing to verify that `bpf_patch_insn_data` and > the mechanism suggested in my email produce identical code. > - use pseudo-random testing to verify that offsets after rewrite point > to expected instructions (e.g. use .imm field as a unique marker for > the instruction). > - hand-written tests for corner cases. > > Would you consider this to be sufficient? (Probably does not sound > too reassuring from the person[me] who submits patches with trivial > use after free errors :) > > [...] > > > Before proceeding too far based on artificial test please collect > > the number of patches and their sizes we currently do across all progs > > in selftests. Maybe it's not that bad in real code. > > I will collect and share the stats on Saturday or Sunday. > > > As far as algo the patch_and_copy_insn() assumes that 'dst' insn is a branch? > > I guess I'm missing some parts of the proposed algo. > > Sorry, I made a mistake in the original email, the code for > patch_and_copy_insn() should look as follows: > > static void patch_and_copy_insn(struct bpf_patching_state *state, int pc, > struct bpf_insn *dst, struct bpf_insn *src) { > memcpy(dst, src, sizeof(struct bpf_insn)); > // TODO: other classes > // TODO: also adjust imm for calls > if (BPF_CLASS(src->code) == BPF_JMP) { > int new_pc = pc + lookup_delta_for_index(state, pc); > int dst_pc = pc + src->off + 1; > int new_dst = dst_pc + lookup_delta_for_index(state, dst_pc); > dst->off = new_dst - new_pc - 1; > } > } > > > The logic is as follows: > - compute new instruction index for the old pc > - compute new instruction index for the (old pc + offset) > - use these values to compute the new offset > > The draft implementation of this algorithm is at the end of this > email. > > > Instead of delaying everything maybe we can do all patching inline > > except bpf_adj_branches? > > Remember only: > > int offset; /* index inside the original program, > > * the instruction at this index would be replaced. > > */ > > int insns_count; /* size of the patch */ > > int delta; /* difference in indexes between original program and > > * new program after application of all patches up to > > * and including this one. > > */ > > and do single bpf_adj_branches at the end ? > > The algorithm consists of two parts: > - To avoid excessive copying patches are accumulated in a separate > array and size of this array is doubled each time the capacity is > not sufficient to fit a new patch. This should have O(log(n)) > complexity in terms of copied bytes. > - To avoid excessive branch adjustment passes a single branch > adjustment pass is performed at the end. This pass visits each > instruction only once, however, for each instruction it will have > to query the delta value in a sorted array. Thus the overall > complexity of this pass would be O(n*log(n)). It is possible to > adjust some relevant fields in `env` during this pass as well. Before jumping into coding let's explore the alternatives. Can we use some of the compiler techniques? Like: - split a set of insn into basic blocks (BB) - convert each jmp from relative offset to fixed bb index - perform patching of insns. No jmp adjustments are necessary. Every patch applies within bb. Minimal realloc+copy of insns within bb. - reconstruct a set of insn from a set of bb-s. John, Yonghong, everyone, may be other ideas?