On Thu, 2024-06-27 at 17:43 -0700, Alexei Starovoitov wrote: > On Thu, Jun 27, 2024 at 2:09 AM Ilya Leoshkevich <iii@xxxxxxxxxxxxx> > wrote: > > > > s390x supports most BPF atomics using single instructions, which > > makes implementing arena support a matter of adding arena address > > to > > the base register (unfortunately atomics do not support index > > registers), and wrapping the respective native instruction in > > probing > > sequences. > > > > An exception is BPF_XCHG, which is implemented using two different > > memory accesses and a loop. Make sure there is enough extable > > entries > > for both instructions. Compute the base address once for both > > memory > > accesses. Since on exception we need to land after the loop, emit > > the > > nops manually. > > > > Signed-off-by: Ilya Leoshkevich <iii@xxxxxxxxxxxxx> > > --- > > arch/s390/net/bpf_jit_comp.c | 100 > > +++++++++++++++++++++++++++++++---- > > 1 file changed, 91 insertions(+), 9 deletions(-) [...] > > + > > +bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena) > > +{ > > + /* > > + * Currently the verifier uses this function only to check > > which > > + * atomic stores to arena are supported, and they all are. > > + */ > > + return true; > > Including all the multi insn instructions that are implemented as > loops? > On x86 I left out atomic+fetch+[and|or|xor], > because they're tricky with looping. > Just checking that when an exception happens > the loop is not going to become infinite ? > If I'm reading the code correctly the exception handling will not > only > skip one insn, but will skip the whole loop? On s390x only BPF_XCHG needs to be implemented as a loop, the rest are single instructions. For example, there is LOAD AND EXCLUSIVE OR, which is atomic, updates memory, and puts the original value into a register. For BPF_XCHG the exception handler will skip the entire loop after an exception. BPF_XCHG has two memory accesses: the initial LOAD, and then the COMPARE AND SWAP loop. I wasn't able to test the exception handling for COMPARE AND SWAP, because I would have to inject a race that would free the arena page after the initial LOAD. Now that you asked, I added the following temporary patch to skip the LOAD: --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -1598,10 +1598,8 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, struct bpf_jit_probe load_probe = probe; bpf_jit_probe_atomic_pre(jit, insn, &load_probe); - /* {ly|lg} %w0,off(%arena) */ - EMIT6_DISP_LH(0xe3000000, - is32 ? 0x0058 : 0x0004, REG_W0, REG_0, - load_probe.arena_reg, off); + /* bcr 0,%0 (nop) */ + _EMIT2(0x0700); bpf_jit_probe_emit_nop(jit, &load_probe); /* Reuse {ly|lg}'s arena_reg for {csy|csg}. */ if (load_probe.prg != -1) { This is still a valid BPF_XCHG implementation, just less efficient in the non-contended case. The exception handling works, but I found a bug: the hard-coded offset in /* brc 4,0b */ EMIT4_PCREL_RIC(0xa7040000, 4, jit->prg - 6); is no longer valid due to the extra nop added by this patch. I will fix this and resend.