Re: [PATCH bpf-next 08/10] s390/bpf: Support arena atomics

Ilya Leoshkevich <iii@xxxxxxxxxxxxx> · Fri, 28 Jun 2024 11:09:17 +0200

On Thu, 2024-06-27 at 17:43 -0700, Alexei Starovoitov wrote:
> On Thu, Jun 27, 2024 at 2:09 AM Ilya Leoshkevich <iii@xxxxxxxxxxxxx>
> wrote:
> > 
> > s390x supports most BPF atomics using single instructions, which
> > makes implementing arena support a matter of adding arena address
> > to
> > the base register (unfortunately atomics do not support index
> > registers), and wrapping the respective native instruction in
> > probing
> > sequences.
> > 
> > An exception is BPF_XCHG, which is implemented using two different
> > memory accesses and a loop. Make sure there is enough extable
> > entries
> > for both instructions. Compute the base address once for both
> > memory
> > accesses. Since on exception we need to land after the loop, emit
> > the
> > nops manually.
> > 
> > Signed-off-by: Ilya Leoshkevich <iii@xxxxxxxxxxxxx>
> > ---
> >  arch/s390/net/bpf_jit_comp.c | 100
> > +++++++++++++++++++++++++++++++----
> >  1 file changed, 91 insertions(+), 9 deletions(-)

[...]

> > +
> > +bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
> > +{
> > +       /*
> > +        * Currently the verifier uses this function only to check
> > which
> > +        * atomic stores to arena are supported, and they all are.
> > +        */
> > +       return true;
> 
> Including all the multi insn instructions that are implemented as
> loops?
> On x86 I left out atomic+fetch+[and|or|xor],
> because they're tricky with looping.
> Just checking that when an exception happens
> the loop is not going to become infinite ?
> If I'm reading the code correctly the exception handling will not
> only
> skip one insn, but will skip the whole loop?

On s390x only BPF_XCHG needs to be implemented as a loop, the rest
are single instructions. For example, there is LOAD AND EXCLUSIVE OR,
which is atomic, updates memory, and puts the original value into a
register.

For BPF_XCHG the exception handler will skip the entire loop after
an exception. BPF_XCHG has two memory accesses: the initial LOAD, and
then the COMPARE AND SWAP loop. I wasn't able to test the exception
handling for COMPARE AND SWAP, because I would have to inject a race
that would free the arena page after the initial LOAD.

Now that you asked, I added the following temporary patch to skip the
LOAD:

--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -1598,10 +1598,8 @@ static noinline int bpf_jit_insn(struct bpf_jit
*jit, struct bpf_prog *fp,
                        struct bpf_jit_probe load_probe = probe;
 
                        bpf_jit_probe_atomic_pre(jit, insn,
&load_probe);
-                       /* {ly|lg} %w0,off(%arena) */
-                       EMIT6_DISP_LH(0xe3000000,
-                                     is32 ? 0x0058 : 0x0004, REG_W0,
REG_0,
-                                     load_probe.arena_reg, off);
+                       /* bcr 0,%0 (nop) */
+                       _EMIT2(0x0700);
                        bpf_jit_probe_emit_nop(jit, &load_probe);
                        /* Reuse {ly|lg}'s arena_reg for {csy|csg}. */
                        if (load_probe.prg != -1) {

This is still a valid BPF_XCHG implementation, just less efficient in
the non-contended case. The exception handling works, but I found a
bug: the hard-coded offset in

			/* brc 4,0b */
			EMIT4_PCREL_RIC(0xa7040000, 4, jit->prg - 6);

is no longer valid due to the extra nop added by this patch.

I will fix this and resend.