On Mon, Mar 14, 2022 at 07:25 PM +01, Ilya Leoshkevich wrote: > On Mon, 2022-03-14 at 18:35 +0100, Jakub Sitnicki wrote: >> On Thu, Mar 10, 2022 at 11:57 PM +01, Jakub Sitnicki wrote: >> > On Wed, Mar 09, 2022 at 01:34 PM +01, Ilya Leoshkevich wrote: >> > > On Wed, 2022-03-09 at 09:36 +0100, Jakub Sitnicki wrote: >> > >> > [...] >> > >> > > > >> > > > Consider this - today the below is true on both LE and BE, >> > > > right? >> > > > >> > > > *(u32 *)&ctx->remote_port == *(u16 *)&ctx->remote_port >> > > > >> > > > because the loads get converted to: >> > > > >> > > > *(u16 *)&ctx_kern->sport == *(u16 *)&ctx_kern->sport >> > > > >> > > > IOW, today, because of the bug that you are fixing here, the >> > > > data >> > > > layout >> > > > changes from the PoV of the BPF program depending on the load >> > > > size. >> > > > >> > > > With 2-byte loads, without this patch, the data layout appears >> > > > as: >> > > > >> > > > struct bpf_sk_lookup { >> > > > ... >> > > > __be16 remote_port; >> > > > __be16 remote_port; >> > > > ... >> > > > } >> > > >> > > I see, one can indeed argue that this is also a part of the ABI >> > > now. >> > > So we're stuck between a rock and a hard place. >> > > >> > > > While for 4-byte loads, it appears as in your 2nd patch: >> > > > >> > > > struct bpf_sk_lookup { >> > > > ... >> > > > #if little-endian >> > > > __be16 remote_port; >> > > > __u16 :16; /* zero padding */ >> > > > #elif big-endian >> > > > __u16 :16; /* zero padding */ >> > > > __be16 remote_port; >> > > > #endif >> > > > ... >> > > > } >> > > > >> > > > Because of that I don't see how we could keep complete ABI >> > > > compatiblity, >> > > > and have just one definition of struct bpf_sk_lookup that >> > > > reflects >> > > > it. These are conflicting requirements. >> > > > >> > > > I'd bite the bullet for 4-byte loads, for the sake of having an >> > > > endian-agnostic struct bpf_sk_lookup and struct bpf_sock >> > > > definition >> > > > in >> > > > the uAPI header. >> > > > >> > > > The sacrifice here is that the access converter will have to >> > > > keep >> > > > rewriting 4-byte access to bpf_sk_lookup.remote_port and >> > > > bpf_sock.dst_port in this unexpected, quirky manner. >> > > > >> > > > The expectation is that with time users will recompile their >> > > > BPF >> > > > progs >> > > > against the updated bpf.h, and switch to 2-byte loads. That >> > > > will make >> > > > the quirk in the access converter dead code in time. >> > > > >> > > > I don't have any better ideas. Sorry. >> > > > >> > > > [...] >> > > >> > > I agree, let's go ahead with this solution. >> > > >> > > The only remaining problem that I see is: the bug is in the >> > > common >> > > code, and it will affect the fields that we add in the future. >> > > >> > > Can we either document this state of things in a comment, or fix >> > > the >> > > bug and emulate the old behavior for certain fields? >> > >> > I think we can fix the bug in the common code, and compensate for >> > the >> > quirky 4-byte access to bpf_sk_lookup.remote_port and >> > bpf_sock.dst_port >> > in the is_valid_access and convert_ctx_access. >> > >> > With the patch as below, access to remote_port gets rewritten to: >> > >> > * size=1, offset=0, r2 = *(u8 *)(r1 +36) >> > 0: (69) r2 = *(u16 *)(r1 +4) >> > 1: (54) w2 &= 255 >> > 2: (b7) r0 = 0 >> > 3: (95) exit >> > >> > * size=1, offset=1, r2 = *(u8 *)(r1 +37) >> > 0: (69) r2 = *(u16 *)(r1 +4) >> > 1: (74) w2 >>= 8 >> > 2: (54) w2 &= 255 >> > 3: (b7) r0 = 0 >> > 4: (95) exit >> > >> > * size=1, offset=2, r2 = *(u8 *)(r1 +38) >> > 0: (b4) w2 = 0 >> > 1: (54) w2 &= 255 >> > 2: (b7) r0 = 0 >> > 3: (95) exit >> > >> > * size=1, offset=3, r2 = *(u8 *)(r1 +39) >> > 0: (b4) w2 = 0 >> > 1: (74) w2 >>= 8 >> > 2: (54) w2 &= 255 >> > 3: (b7) r0 = 0 >> > 4: (95) exit >> > >> > * size=2, offset=0, r2 = *(u16 *)(r1 +36) >> > 0: (69) r2 = *(u16 *)(r1 +4) >> > 1: (b7) r0 = 0 >> > 2: (95) exit >> > >> > * size=2, offset=2, r2 = *(u16 *)(r1 +38) >> > 0: (b4) w2 = 0 >> > 1: (b7) r0 = 0 >> > 2: (95) exit >> > >> > * size=4, offset=0, r2 = *(u32 *)(r1 +36) >> > 0: (69) r2 = *(u16 *)(r1 +4) >> > 1: (b7) r0 = 0 >> > 2: (95) exit >> > >> > How does that look to you? >> > >> > Still need to give it a test on s390x. >> >> Context conversion with patch below applied looks correct to me on >> s390x >> as well: >> >> * size=1, offset=0, r2 = *(u8 *)(r1 +36) >> 0: (69) r2 = *(u16 *)(r1 +4) >> 1: (bc) w2 = w2 >> 2: (74) w2 >>= 8 >> 3: (bc) w2 = w2 >> 4: (54) w2 &= 255 >> 5: (bc) w2 = w2 >> 6: (b7) r0 = 0 >> 7: (95) exit >> >> * size=1, offset=1, r2 = *(u8 *)(r1 +37) >> 0: (69) r2 = *(u16 *)(r1 +4) >> 1: (bc) w2 = w2 >> 2: (54) w2 &= 255 >> 3: (bc) w2 = w2 >> 4: (b7) r0 = 0 >> 5: (95) exit >> >> * size=1, offset=2, r2 = *(u8 *)(r1 +38) >> 0: (b4) w2 = 0 >> 1: (bc) w2 = w2 >> 2: (74) w2 >>= 8 >> 3: (bc) w2 = w2 >> 4: (54) w2 &= 255 >> 5: (bc) w2 = w2 >> 6: (b7) r0 = 0 >> 7: (95) exit >> >> * size=1, offset=3, r2 = *(u8 *)(r1 +39) >> 0: (b4) w2 = 0 >> 1: (bc) w2 = w2 >> 2: (54) w2 &= 255 >> 3: (bc) w2 = w2 >> 4: (b7) r0 = 0 >> 5: (95) exit >> >> * size=2, offset=0, r2 = *(u16 *)(r1 +36) >> 0: (69) r2 = *(u16 *)(r1 +4) >> 1: (bc) w2 = w2 >> 2: (b7) r0 = 0 >> 3: (95) exit >> >> * size=2, offset=2, r2 = *(u16 *)(r1 +38) >> 0: (b4) w2 = 0 >> 1: (bc) w2 = w2 >> 2: (b7) r0 = 0 >> 3: (95) exit >> >> * size=4, offset=0, r2 = *(u32 *)(r1 +36) >> 0: (69) r2 = *(u16 *)(r1 +4) >> 1: (bc) w2 = w2 >> 2: (b7) r0 = 0 >> 3: (95) exit >> >> If we go this way, this should unbreak the bpf selftests on BE, >> independently of the patch 1 from this series. >> >> Will send it as a patch, so that we continue the review discussion. > > I applied this patch to bpf-next, and while it looks reasonable, the > test still fails, e.g. here: > > /* Load from remote_port field with zero padding (backward > compatibility) */ > val_u32 = *(__u32 *)&ctx->remote_port; > if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16)) > return SK_DROP; > You are right. That's that the check I recently added that broke the bpf CI runs for s390x [1], and started our thread. I had a stale build of test_progs with a fix akin to patch [2] that I was testing and missed that. Thanks for giving it a run. If we go with Martin's suggestion [3] here and avoid bpf_htonl(), then we could make it work and save ourselves endianess checks. IOW, a patch like below would be also needed to unbreak the tests. First chunk is copied from your fixes to test_sk_lookup in patch 3 in this RFC series, naturally. [1] https://lore.kernel.org/bpf/CAEf4BzaRNLw9_EnaMo5e46CdEkzbJiVU3j9oxnsemBKjNFf3wQ@xxxxxxxxxxxxxx/ [2] https://lore.kernel.org/bpf/20220227202757.519015-4-jakub@xxxxxxxxxxxxxx/ [3] https://lore.kernel.org/bpf/20220301062207.d3aqge5qg623asr6@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ ---8<--- diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup.c b/tools/testing/selftests/bpf/progs/test_sk_lookup.c index bf5b7caefdd0..2765a4bd500c 100644 --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c @@ -413,15 +413,14 @@ int ctx_narrow_access(struct bpf_sk_lookup *ctx) /* Narrow loads from remote_port field. Expect SRC_PORT. */ if (LSB(ctx->remote_port, 0) != ((SRC_PORT >> 0) & 0xff) || - LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff) || - LSB(ctx->remote_port, 2) != 0 || LSB(ctx->remote_port, 3) != 0) + LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff)) return SK_DROP; if (LSW(ctx->remote_port, 0) != SRC_PORT) return SK_DROP; /* Load from remote_port field with zero padding (backward compatibility) */ val_u32 = *(__u32 *)&ctx->remote_port; - if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16)) + if (val_u32 != SRC_PORT) return SK_DROP; /* Narrow loads from local_port field. Expect DST_PORT. */