On 05/03/2019 12:42 PM, Jiong Wang wrote: > eBPF ISA specification requires high 32-bit cleared when low 32-bit > sub-register is written. This applies to destination register of ALU32 etc. > JIT back-ends must guarantee this semantic when doing code-gen. > > x86-64 and arm64 ISA has the same semantic, so the corresponding JIT > back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp > etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero > extension sequence to meet such semantic. > > This is important, because for code the following: > > u64_value = (u64) u32_value > ... other uses of u64_value > > compiler could exploit the semantic described above and save those zero > extensions for extending u32_value to u64_value. Hardware, runtime, or BPF > JIT back-ends, are responsible for guaranteeing this. Some benchmarks show > ~40% sub-register writes out of total insns, meaning ~40% extra code-gen ( > could go up to more for some arches which requires two shifts for zero > extension) because JIT back-end needs to do extra code-gen for all such > instructions. > > However this is not always necessary in case u32_value is never cast into > a u64, which is quite normal in real life program. So, it would be really > good if we could identify those places where such type cast happened, and > only do zero extensions for them, not for the others. This could save a lot > of BPF code-gen. > > Algo: > - Split read flags into READ32 and READ64. > > - Record indices of instructions that do sub-register def (write). And > these indices need to stay with reg state so path pruning and bpf > to bpf function call could be handled properly. > > These indices are kept up to date while doing insn walk. > > - A full register read on an active sub-register def marks the def insn as > needing zero extension on dst register. > > - A new sub-register write overrides the old one. > > A new full register write makes the register free of zero extension on > dst register. > > - When propagating read64 during path pruning, also marks def insns whose > defs are hanging active sub-register. > > Reviewed-by: Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx> > Signed-off-by: Jiong Wang <jiong.wang@xxxxxxxxxxxxx> [...] > +/* This function is supposed to be used by the following 32-bit optimization > + * code only. It returns TRUE if the source or destination register operates > + * on 64-bit, otherwise return FALSE. > + */ > +static bool is_reg64(struct bpf_verifier_env *env, struct bpf_insn *insn, > + u32 regno, struct bpf_reg_state *reg, enum reg_arg_type t) > +{ > + u8 code, class, op; > + > + code = insn->code; > + class = BPF_CLASS(code); > + op = BPF_OP(code); > + if (class == BPF_JMP) { > + /* BPF_EXIT for "main" will reach here. Return TRUE > + * conservatively. > + */ > + if (op == BPF_EXIT) > + return true; > + if (op == BPF_CALL) { > + /* BPF to BPF call will reach here because of marking > + * caller saved clobber with DST_OP_NO_MARK for which we > + * don't care the register def because they are anyway > + * marked as NOT_INIT already. > + */ > + if (insn->src_reg == BPF_PSEUDO_CALL) > + return false; > + /* Helper call will reach here because of arg type > + * check. > + */ > + if (t == SRC_OP) > + return helper_call_arg64(env, insn->imm, regno); > + > + return false; > + } > + } > + > + if (class == BPF_ALU64 || class == BPF_JMP || > + /* BPF_END always use BPF_ALU class. */ > + (class == BPF_ALU && op == BPF_END && insn->imm == 64)) > + return true; For the BPF_JMP + JA case we don't look at registers, but I presume here we 'pretend' to use 64 bit regs to be more conservative as verifier would otherwise need to do more complex analysis at the jump target wrt zero extension, correct? > + > + if (class == BPF_ALU || class == BPF_JMP32) > + return false; > + > + if (class == BPF_LDX) { > + if (t != SRC_OP) > + return BPF_SIZE(code) == BPF_DW; > + /* LDX source must be ptr. */ > + return true; > + } > + > + if (class == BPF_STX) { > + if (reg->type != SCALAR_VALUE) > + return true; > + return BPF_SIZE(code) == BPF_DW; > + } > + > + if (class == BPF_LD) { > + u8 mode = BPF_MODE(code); > + > + /* LD_IMM64 */ > + if (mode == BPF_IMM) > + return true; > + > + /* Both LD_IND and LD_ABS return 32-bit data. */ > + if (t != SRC_OP) > + return false; > + > + /* Implicit ctx ptr. */ > + if (regno == BPF_REG_6) > + return true; > + > + /* Explicit source could be any width. */ > + return true; > + } > + > + if (class == BPF_ST) > + /* The only source register for BPF_ST is a ptr. */ > + return true; > + > + /* Conservatively return true at default. */ > + return true; > +}