Alexei Starovoitov writes: > On Wed, Mar 27, 2019 at 05:18:35PM +0000, Jiong Wang wrote: >> >> > On 27 Mar 2019, at 17:17, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: >> > >> > On Wed, Mar 27, 2019 at 05:06:01PM +0000, Jiong Wang wrote: >> >> >> >>> On 27 Mar 2019, at 17:00, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: >> >>> >> >>> On Tue, Mar 26, 2019 at 06:05:29PM +0000, Jiong Wang wrote: >> >>>> After previous patches, verifier has marked those instructions that really >> >>>> need zero extension on dst_reg. >> >>>> >> >>>> It is then for all back-ends to decide how to use such information to >> >>>> eliminate unnecessary zero extension codegen during JIT compilation. >> >>>> >> >>>> One approach is: >> >>>> 1. Verifier insert explicit zero extension for those instructions that >> >>>> need zero extension. >> >>>> 2. All JIT back-ends do NOT generate zero extension for sub-register >> >>>> write any more. >> >>>> >> >>>> The good thing for this approach is no major change on JIT back-end >> >>>> interface, all back-ends could get this optimization. >> >>>> >> >>>> However, only those back-ends that do not have hardware zero extension >> >>>> want this optimization. For back-ends like x86_64 and AArch64, there is >> >>>> hardware support, so this optimization should be disabled. >> >>>> >> >>>> This patch introduces new sysctl "bpf_jit_32bit_opt" which is the control >> >>>> variable for whether the optimization should be enabled. >> >>>> >> >>>> It is initialized using target hook bpf_jit_hardware_zext which is default >> >>>> true, meaning the underlying hardware will do zero extension automatically, >> >>>> therefore the optimization will be disabled. >> >>>> >> >>>> Offload targets do not use this native target hook, instead, they could >> >>>> get the optimization results using bpf_prog_offload_ops.finalize. >> >>>> >> >>>> The user could always enable or disable the optimization by using: >> >>>> >> >>>> sysctl net/core/bpf_jit_32bit_opt=[0 | 1] >> >>> >> >>> I don't think there should be a sysctl for this. >> >> >> >> The sysctl introduced mostly because I think it could be useful for testing. >> >> For example on x86_64, with this sysctl, we can enable the optimisation and >> >> can run selftest. >> >> >> >> Does this make sense? >> >> >> >> Or when one insn is marked, we print verbose info, so the tester could catch >> >> it from log? >> > >> > sysctl in this patch only triggers insertion of shifts. >> > what kind of testing does it enable on x64? >> > The writing insn is already 32-bit and hw does zero extend. >> > These two shifts is always a nop? >> > a sysctl to test that the verifier inserted shifts in the right place? >> >> Yes, that’s the test methodology I am using. Match the instruction sequence after >> shifts insertion. > > I see. I don't think such extra shifts right after hw zero extend will catch much. > imo it would be better to populate upper 32-bit with random values on x64 > where verifier analysis showed that it's ok to do so. Sound like a good idea, indeed gives much more stressful test on x64, and if all tests passed under test_progs + -mattr=+alu32, then could be very good assurance on the correctness. > Such extra insns can be inserted by the verifier. Since such debugging > has run-time cost we'd need a flag to turn it on. > May be a new flag during prog load instead of sysctl? OK, I will explore on this line, see if could have a clean solution. > It can be a global switch inside libbpf, so test_verifier and test_progs > wouldn't need to pass it everywhere explictly. It would double the test time, > but it's worth doing always on all archs. Especially on x64. > > other thoughts... > I guess it's ok to stick with shifts for now. > Introducing new insn would be nice, but we can do it later. > Changing all jits for this new insn as pre-patch to this set is too much. +1 > peephole to convert shifts is probably useful regardless. > bpf backend emits a bunch of useless shifts when alu32 is not used. > Would be great if x86 jit can optimize it for such lazy users > (and users who don't upgrade llvm fast enough or don't know about alu32) Will do some checks on generic eBPF code-gen later to see how much peephole opportunities there are. Regards, Jiong