> On 27 Mar 2019, at 16:50, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Tue, Mar 26, 2019 at 06:05:27PM +0000, Jiong Wang wrote: >> eBPF ISA specification requires high 32-bit cleared when low 32-bit >> sub-register is written. This applies to destination register of ALU32 etc. >> JIT back-ends must guarantee this semantic when doing code-gen. >> >> x86-64 and arm64 ISA has the same semantic, so the corresponding JIT >> back-end doesn't need to do extra work. However, 32-bit arches (arm, nfp >> etc.) and some other 64-bit arches (powerpc, sparc etc), need explicit zero >> extension sequence to meet such semantic. >> >> This is important, because for code the following: >> >> u64_value = (u64) u32_value >> ... other uses of u64_value >> >> compiler could exploit the semantic described above and save those zero >> extensions for extending u32_value to u64_value. Hardware, runtime, or BPF >> JIT back-ends, are responsible for guaranteeing this. Some benchmarks show >> ~40% sub-register writes out of total insns, meaning ~40% extra code-gen ( >> could go up to more for some arches which requires two shifts for zero >> extension) because JIT back-end needs to do extra code-gen for all such >> instructions. >> >> However this is not always necessary in case u32_value is never cast into >> a u64, which is quite normal in real life program. So, it would be really >> good if we could identify those places where such type cast happened, and >> only do zero extensions for them, not for the others. This could save a lot >> of BPF code-gen. >> >> Algo: >> - Record indices of instructions that do sub-register def (write). And >> these indices need to stay with function state so path pruning and bpf >> to bpf function call could be handled properly. >> >> These indices are kept up to date while doing insn walk. >> >> - A full register read on an active sub-register def marks the def insn as >> needing zero extension on dst register. >> >> - A new sub-register write overrides the old one. >> >> A new full register write makes the register free of zero extension on >> dst register. >> >> - When propagating register read64 during path pruning, it also marks def >> insns whose defs are hanging active sub-register, if there is any read64 >> from shown from the equal state. >> >> Reviewed-by: Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx> >> Signed-off-by: Jiong Wang <jiong.wang@xxxxxxxxxxxxx> >> --- >> include/linux/bpf_verifier.h | 4 +++ >> kernel/bpf/verifier.c | 85 +++++++++++++++++++++++++++++++++++++++++--- >> 2 files changed, 84 insertions(+), 5 deletions(-) >> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h >> index 27761ab..0ae9a3f 100644 >> --- a/include/linux/bpf_verifier.h >> +++ b/include/linux/bpf_verifier.h >> @@ -181,6 +181,9 @@ struct bpf_func_state { >> */ >> u32 subprogno; >> >> + /* tracks subreg definition. */ >> + s32 subreg_def[MAX_BPF_REG]; > > Same comment as Ed and another question: why it's not part of bpf_reg_state ? Thanks for spotting this, indeed, it looks perfect to be merged into bpf_reg_state. Regards, Jiong