On Tue, Mar 26, 2019 at 06:05:27PM +0000, Jiong Wang wrote: > eBPF ISA specification requires high 32-bit cleared when low 32-bit > sub-register is written. This applies to destination register of ALU32 etc. > JIT back-ends must guarantee this semantic when doing code-gen. > > x86-64 and arm64 ISA has the same semantic, so the corresponding JIT > back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp > etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero > extension sequence to meet such semantic. > > This is important, because for code the following: > > u64_value = (u64) u32_value > ... other uses of u64_value > > compiler could exploit the semantic described above and save those zero > extensions for extending u32_value to u64_value. Hardware, runtime, or BPF > JIT back-ends, are responsible for guaranteeing this. Some benchmarks show > ~40% sub-register writes out of total insns, meaning ~40% extra code-gen ( > could go up to more for some arches which requires two shifts for zero > extension) because JIT back-end needs to do extra code-gen for all such > instructions. > > However this is not always necessary in case u32_value is never cast into > a u64, which is quite normal in real life program. So, it would be really > good if we could identify those places where such type cast happened, and > only do zero extensions for them, not for the others. This could save a lot > of BPF code-gen. > > Algo: > - Record indices of instructions that do sub-register def (write). And > these indices need to stay with function state so path pruning and bpf > to bpf function call could be handled properly. > > These indices are kept up to date while doing insn walk. > > - A full register read on an active sub-register def marks the def insn as > needing zero extension on dst register. > > - A new sub-register write overrides the old one. > > A new full register write makes the register free of zero extension on > dst register. > > - When propagating register read64 during path pruning, it also marks def > insns whose defs are hanging active sub-register, if there is any read64 > from shown from the equal state. > > Reviewed-by: Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx> > Signed-off-by: Jiong Wang <jiong.wang@xxxxxxxxxxxxx> > --- > include/linux/bpf_verifier.h | 4 +++ > kernel/bpf/verifier.c | 85 +++++++++++++++++++++++++++++++++++++++++--- > 2 files changed, 84 insertions(+), 5 deletions(-) > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > index 27761ab..0ae9a3f 100644 > --- a/include/linux/bpf_verifier.h > +++ b/include/linux/bpf_verifier.h > @@ -181,6 +181,9 @@ struct bpf_func_state { > */ > u32 subprogno; > > + /* tracks subreg definition. */ > + s32 subreg_def[MAX_BPF_REG]; Same comment as Ed and another question: why it's not part of bpf_reg_state ?