On Sun, Mar 24, 2024 at 8:28 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Fri, Mar 22, 2024 at 9:45 AM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > Designing and adding new instruction also sounds fine, but it's a > > separate feature and is more involved and will require careful > > considerations. E.g., we'll need to think through whether all JITs > > will be able to implement it in native code (i.e., whether they will > > have enough free registers to implement this efficiently; x86-64 is > > lucky to not need any extra registers, I believe ARM64 needs one extra > > register, though; other architectures I have no idea). Safety > > considerations as well (do we accept any random offset, or only ones > > coming from per-CPU BTF variables, etc). I don't know if there are any > > differences, but per-CPU access for module per-CPU variables is > > something to look into (I have no clue). > > I didn't mean to add an insn that is exposed all the way to > bpf prog and requires verification, addr checks, etc. > But a pseudo insn that the verifier can emit when it inlines > various helpers. > It's ok to add them iteratively and revise as we go, since it > won't be an api. Ah, ok, so an instruction that JIT will know about, but the verifier will reject if provided by the user? Ok, I can do that. > Like per_cpu_ptr() equivalent insn doesn't need to be added right away. > this_cpu_ptr(offset) will be good enough to start. > JIT-ing that single insn will be trivial and > you implemented it already: > EMIT3_off32(0x65, 0x8b, 0x05, (u32)offset); else ... > > The verifier will emit BPF_THIS_CPU_PTR pseudo insn with > pcpu_hot.cpu_number offset when it inlines bpf_get_smp_processor_id(). > > At the same time it can inline bpf_this_cpu_ptr() with the same insn. > sounds good, we'll need to guard all that with arch-specific check whether JIT supports this instruction, but that's ok (eventually we should be able to make this insn support mandatory and clean up those checks) > And we can finally implement map_ops->map_gen_lookup for > per-cpu array and hash map using the same pseudo insn. > Those lookups are often used in critical path and > the inlining will give a noticeable performance boost. yep, I'll look into that as well > > > As an interim compromise and solution, would you like me to implement > > a similar inlining for bpf_this_cpu_ptr() as well? It's just as > > trivial to do as for bpf_get_smp_processor_id(), and would benefit all > > existing users of bpf_this_cpu_ptr() as well, while relying on > > existing BTF information and surrounding infrastructure? > > yes via pseudo insn. > > after per-cpu maps, various bpf_*redirect*() helpers > can be inlined and XDP folks will enjoy extra performance, > and likely other cases where we used different solutions > instead of emitting per-cpu accesses in the verifier. that part I'll leave up to XDP folks :)