On Tue, Feb 22, 2022 at 10:51 PM +01, Ilya Leoshkevich wrote: > On Tue, 2022-02-22 at 19:24 +0100, Jakub Sitnicki wrote: >> On Tue, Feb 22, 2022 at 06:42 PM +01, Ilya Leoshkevich wrote: >> > On Tue, 2022-02-22 at 15:53 +0100, Jakub Sitnicki wrote: >> > > On Tue, Feb 22, 2022 at 03:22 AM +01, Ilya Leoshkevich wrote: >> > > > On Tue, 2022-02-22 at 01:43 +0100, Ilya Leoshkevich wrote: >> > > > > On Mon, 2022-02-21 at 22:39 +0100, Ilya Leoshkevich wrote: >> > > > > > On Mon, 2022-02-21 at 19:03 +0100, Jakub Sitnicki wrote: >> > > > > > > Shifting 16-bit type by 16 bits is implementation-defined >> > > > > > > for >> > > > > > > BPF >> > > > > > > programs. >> > > > > > > Don't rely on it in case it is causing the test failures >> > > > > > > we >> > > > > > > are >> > > > > > > seeing on >> > > > > > > s390x z15 target. >> > > > > > > >> > > > > > > Fixes: 2ed0dc5937d3 ("selftests/bpf: Cover 4-byte load >> > > > > > > from >> > > > > > > remote_port in bpf_sk_lookup") >> > > > > > > Reported-by: Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> >> > > > > > > Signed-off-by: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx> >> > > > > > > --- >> > > > > > > >> > > > > > > I don't have a dev env for s390x/z15 set up yet, so can't >> > > > > > > definitely >> > > > > > > confirm the fix. >> > > > > > > That said, it seems worth fixing either way. >> > > > > > > >> > > > > > > tools/testing/selftests/bpf/progs/test_sk_lookup.c | 3 >> > > > > > > ++- >> > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) >> > > > > > > >> > > > > > > diff --git >> > > > > > > a/tools/testing/selftests/bpf/progs/test_sk_lookup.c >> > > > > > > b/tools/testing/selftests/bpf/progs/test_sk_lookup.c >> > > > > > > index bf5b7caefdd0..7d47276a8964 100644 >> > > > > > > --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c >> > > > > > > +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c >> > > > > > > @@ -65,6 +65,7 @@ static const __u32 KEY_SERVER_A = >> > > > > > > SERVER_A; >> > > > > > > static const __u32 KEY_SERVER_B = SERVER_B; >> > > > > > > >> > > > > > > static const __u16 SRC_PORT = bpf_htons(8008); >> > > > > > > +static const __u32 SRC_PORT_U32 = bpf_htonl(8008U << >> > > > > > > 16); >> > > > > > > static const __u32 SRC_IP4 = IP4(127, 0, 0, 2); >> > > > > > > static const __u32 SRC_IP6[] = IP6(0xfd000000, 0x0, 0x0, >> > > > > > > 0x00000002); >> > > > > > > >> > > > > > > @@ -421,7 +422,7 @@ int ctx_narrow_access(struct >> > > > > > > bpf_sk_lookup >> > > > > > > *ctx) >> > > > > > > >> > > > > > > /* Load from remote_port field with zero padding >> > > > > > > (backward >> > > > > > > compatibility) */ >> > > > > > > val_u32 = *(__u32 *)&ctx->remote_port; >> > > > > > > - if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << >> > > > > > > 16)) >> > > > > > > + if (val_u32 != SRC_PORT_U32) >> > > > > > > return SK_DROP; >> > > > > > > >> > > > > > > /* Narrow loads from local_port field. Expect >> > > > > > > DST_PORT. >> > > > > > > */ >> > > > > > >> > > > > > Unfortunately this doesn't help with the s390 problem. >> > > > > > I'll try to debug this. >> > > > > >> > > > > I have to admit I have a hard time wrapping my head around >> > > > > the >> > > > > requirements here. >> > > > > >> > > > > Based on the pre-9a69e2b385f4 code, do I understand correctly >> > > > > that >> > > > > for the following input >> > > > > >> > > > > Port: 0x1f48 >> > > > > SRC_PORT: 0x481f >> > > > > >> > > > > we expect the following results for different kinds of loads: >> > > > > >> > > > > Size Offset LE BE >> > > > > BPF_B 0 0x1f 0 >> > > > > BPF_B 1 0x48 0 >> > > > > BPF_B 2 0 0x48 >> > > > > BPF_B 3 0 0x1f >> > > > > BPF_H 0 0x481f 0 >> > > > > BPF_H 1 0 0x481f >> > > > > BPF_W 0 0x481f 0x481f >> > > > > >> > > > > and this is guaranteed by the struct bpf_sk_lookup ABI? >> > > > > Because >> > > > > then >> > > > > it >> > > > > looks as if 9a69e2b385f4 breaks it on big-endian as follows: >> > > > > >> > > > > Size Offset BE-9a69e2b385f4 >> > > > > BPF_B 0 0x48 >> > > > > BPF_B 1 0x1f >> > > > > BPF_B 2 0 >> > > > > BPF_B 3 0 >> > > > > BPF_H 0 0x481f >> > > > > BPF_H 1 0 >> > > > > BPF_W 0 0x481f0000 >> > > > >> > > > Sorry, I worded this incorrectly: 9a69e2b385f4 did not change >> > > > the >> > > > kernel behavior, the ABI is not broken and the old compiled >> > > > code >> > > > should >> > > > continue to work. >> > > > What the second table really shows are what the results should >> > > > be >> > > > according to the 9a69e2b385f4 struct bpf_sk_lookup definition, >> > > > which I >> > > > still think is broken on big-endian and needs to be adjusted to >> > > > match >> > > > the ABI. >> > > > >> > > > I noticed one other strange thing in the meantime: loads from >> > > > *(__u32 *)&ctx->remote_port, *(__u16 *)&ctx->remote_port and >> > > > *((__u16 *)&ctx->remote_port + 1) all produce 8008 on s390, >> > > > which >> > > > is >> > > > clearly inconsistent. It looks as if convert_ctx_accesses() >> > > > needs >> > > > to be >> > > > adjusted to handle combinations like ctx_field_size == 4 && >> > > > size == >> > > > 2 >> > > > && target_size == 2. I will continue with this tomorrow. >> > > > >> > > > > Or is the old behavior a bug and this new one is desirable? >> > > > > 9a69e2b385f4 has no Fixes: tag, so I assume that's the former >> > > > > :-( >> > > > > >> > > > > In which case, would it make sense to fix it by swapping >> > > > > remote_port >> > > > > and :16 in bpf_sk_lookup on big-endian? >> > > >> > > Thanks for looking into it. >> > > >> > > When it comes to requirements, my intention was to keep the same >> > > behavior as before the split up of the remote_port field in >> > > 9a69e2b385f4 >> > > ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit >> > > wide"). >> > > >> > > 9a69e2b385f4 was supposed to be a formality, after a similar >> > > change >> > > in >> > > 4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit >> > > wide"), which went in earlier. >> > > >> > > In 4421a582718a I've provided a bit more context. I understand >> > > that >> > > the >> > > remote_port value, even before the type changed from u32 to u16, >> > > appeared to the BPF program as if laid out in memory like so: >> > > >> > > offsetof(struct bpf_sk_lookup, remote_port) +0 <port MSB> >> > > +1 <port LSB> >> > > +2 0x00 >> > > +3 0x00 >> > > >> > > Translating it to your handy table format, I expect should result >> > > in >> > > loads as so if port is 8008 = 0x1f48: >> > > >> > > Size Offset LE BE >> > > BPF_B 0 0x1f 0x1f >> > > BPF_B 1 0x48 0x48 >> > > BPF_B 2 0 0 >> > > BPF_B 3 0 0 >> > > BPF_H 0 0x481f 0x1f48 >> > > BPF_H 1 0 0 >> > > BPF_W 0 0x481f 0x1f480000 >> > >> > Hmm, I think for big-endian the layout is different. >> > If we look at test_sk_lookup.c from 9a69e2b385f4^: >> > >> > /* Narrow loads from remote_port field. Expect SRC_PORT. */ >> > if (LSB(ctx->remote_port, 0) != ((SRC_PORT >> 0) & 0xff) || >> > LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff) || >> > LSB(ctx->remote_port, 2) != 0 || LSB(ctx->remote_port, >> > 3) >> > != 0) >> > return SK_DROP; >> > >> > LSB() on little-endian is just byte indexing, so it's indeed >> > 1f,48,00,00. However, on big-endian it's indexing from the end, so >> > it's 00,00,48,1f. >> >> I understood that LSB() is indexing from the end on BE because >> SRC_PORT >> constant value differs on LE (= 0x481f) and BE (= 0x1f48) platforms, >> so >> >> LE BE >> SRC_PORT >> 0 1f 48 >> SRC_PORT >> 8 48 1f >> >> So on LE we first compare remote_port MSB, then LSB. >> While on BE we start with remote_port LSB, then MSB. >> >> But, now that you have pointed it out, I notice that >> sizeof(remote_port) >> has changed and from 4 to 2, and I can't see how LSB(…, 3) and LSB(…, >> 4) >> loads can keep working on big-endian. > > Oh, right - it should be 00,00,1f,48 on big-endian. > Out-of-bounds LSB is indeed an issue. I've posted my current > thoughts as an RFC series [1], this one is addressed in patch 3. > > [1] > https://lore.kernel.org/bpf/20220222182559.2865596-1-iii@xxxxxxxxxxxxx/ Sorry for the radio silience. I have patched my way through cross-compiling bpf selftests to s390, and can reproduce the sk_lookup test failure on big-endian. bpftool gen object/skeleton refusing to process BE BPF objects on an LE host got me stuck for a moment, but user mode qemu saved the day. FWIW, sock_fields test are also broken on BE, but failing silently, as I observed by instrumenting the read_sk_dst_port BPF prog with bpf_printk. I will be able check out the RFC series over this weekend.