Re: selftest sock_fields failed on s390x with latest llvm17

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2023-05-14 at 09:58 -0700, Yonghong Song wrote:
> 
> 
> On 5/13/23 1:24 AM, Ilya Leoshkevich wrote:
> > On Fri, 2023-05-12 at 21:13 -0700, Yonghong Song wrote:
> > > 
> > > 
> > > On 5/12/23 7:40 AM, Ilya Leoshkevich wrote:
> > > > On Wed, 2023-05-03 at 21:46 +0200, Ilya Leoshkevich wrote:
> > > > > On Wed, 2023-05-03 at 12:35 -0700, Yonghong Song wrote:
> > > > > > Hi, Ilya,
> > > > > > 
> > > > > > BPF CI ([1]) detected a s390x failure when bpf program is
> > > > > > compiled
> > > > > > with
> > > > > > latest llvm17 on bpf-next branch. To reproduce the issue,
> > > > > > just
> > > > > > run
> > > > > > normal './test_progs -j'. The failure log looks like below:
> > > > > > 
> > > > > > Notice: Success: 341/3015, Skipped: 29, Failed: 1
> > > > > > Error: #191 sock_fields
> > > > > >      Error: #191 sock_fields
> > > > > >      create_netns:PASS:create netns 0 nsec
> > > > > >      create_netns:PASS:bring up lo 0 nsec
> > > > > >     
> > > > > > serial_test_sock_fields:PASS:test_sock_fields__open_and_loa
> > > > > > d 0
> > > > > > nsec
> > > > > >     
> > > > > > serial_test_sock_fields:PASS:attach_cgroup(egress_read_sock
> > > > > > _fie
> > > > > > lds)
> > > > > > 0
> > > > > > nsec
> > > > > >     
> > > > > > serial_test_sock_fields:PASS:attach_cgroup(ingress_read_soc
> > > > > > k_fi
> > > > > > elds
> > > > > > )
> > > > > > 0 nsec
> > > > > >     
> > > > > > serial_test_sock_fields:PASS:attach_cgroup(read_sk_dst_port
> > > > > > 0
> > > > > > nsec
> > > > > >      test:PASS:getsockname(listen_fd) 0 nsec
> > > > > >      test:PASS:getsockname(cli_fd) 0 nsec
> > > > > >      test:PASS:accept(listen_fd) 0 nsec
> > > > > >     
> > > > > > init_sk_storage:PASS:bpf_map_update_elem(sk_pkt_out_cnt_fd)
> > > > > > 0
> > > > > > nsec
> > > > > >     
> > > > > > init_sk_storage:PASS:bpf_map_update_elem(sk_pkt_out_cnt10_f
> > > > > > d) 0
> > > > > > nsec
> > > > > >      test:PASS:send(accept_fd) 0 nsec
> > > > > >      test:PASS:recv(cli_fd) 0 nsec
> > > > > >      test:PASS:send(accept_fd) 0 nsec
> > > > > >      test:PASS:recv(cli_fd) 0 nsec
> > > > > >      test:PASS:recv(accept_fd) for fin 0 nsec
> > > > > >      test:PASS:recv(cli_fd) for fin 0 nsec
> > > > > >     
> > > > > > check_sk_pkt_out_cnt:PASS:bpf_map_lookup_elem(sk_pkt_out_cn
> > > > > > t,
> > > > > > &accept_fd) 0 nsec
> > > > > >     
> > > > > > check_sk_pkt_out_cnt:PASS:bpf_map_lookup_elem(sk_pkt_out_cn
> > > > > > t,
> > > > > > &cli_fd) 0 nsec
> > > > > >      check_result:PASS:bpf_map_lookup_elem(linum_map_fd) 0
> > > > > > nsec
> > > > > >      check_result:PASS:bpf_map_lookup_elem(linum_map_fd) 0
> > > > > > nsec
> > > > > >      check_result:PASS:bpf_map_lookup_elem(linum_map_fd,
> > > > > > READ_SK_DST_PORT_IDX) 0 nsec
> > > > > >      check_result:FAIL:failure in read_sk_dst_port on line
> > > > > > unexpected
> > > > > > failure in read_sk_dst_port on line: actual 297 != expected
> > > > > > 0
> > > > > >      listen_sk: state:10 bound_dev_if:0 family:10 type:1
> > > > > > protocol:6
> > > > > > mark:0
> > > > > > priority:0 src_ip4:7f000006(127.0.0.6) src_ip6:0:0:0:1(::1)
> > > > > > src_port:51966 dst_ip4:0(0.0.0.0) dst_ip6:0:0:0:0(::)
> > > > > > dst_port:0
> > > > > >      srv_sk: state:9 bound_dev_if:0 family:10 type:1
> > > > > > protocol:6
> > > > > > mark:0
> > > > > > priority:0 src_ip4:7f000006(127.0.0.6) src_ip6:0:0:0:1(::1)
> > > > > > src_port:51966 dst_ip4:7f000006(127.0.0.6)
> > > > > > dst_ip6:0:0:0:1(::1)
> > > > > > dst_port:38030
> > > > > >      cli_sk: state:5 bound_dev_if:0 family:10 type:1
> > > > > > protocol:6
> > > > > > mark:0
> > > > > > priority:0 src_ip4:7f000006(127.0.0.6) src_ip6:0:0:0:1(::1)
> > > > > > src_port:38030 dst_ip4:0(0.0.0.0) dst_ip6:0:0:0:1(::1)
> > > > > > dst_port:51966
> > > > > >      listen_tp: snd_cwnd:10 srtt_us:0 rtt_min:4294967295
> > > > > > snd_ssthresh:2147483647 rcv_nxt:0 snd_nxt:0 snd:una:0
> > > > > > mss_cache:536
> > > > > > ecn_flags:0 rate_delivered:0 rate_interval_us:0
> > > > > > packets_out:0
> > > > > > retrans_out:0 total_retrans:0 segs_in:0 data_segs_in:0
> > > > > > segs_out:0
> > > > > > data_segs_out:0 lost_out:0 sacked_out:0 bytes_received:0
> > > > > > bytes_acked:0
> > > > > >      srv_tp: snd_cwnd:10 srtt_us:3904 rtt_min:272
> > > > > > snd_ssthresh:2147483647
> > > > > > rcv_nxt:648617715 snd_nxt:4218065810 snd:una:4218065810
> > > > > > mss_cache:32768
> > > > > > ecn_flags:0 rate_delivered:1 rate_interval_us:272
> > > > > > packets_out:0
> > > > > > retrans_out:0 total_retrans:0 segs_in:5 data_segs_in:0
> > > > > > segs_out:3
> > > > > > data_segs_out:2 lost_out:0 sacked_out:0 bytes_received:1
> > > > > > bytes_acked:22
> > > > > >      cli_tp: snd_cwnd:10 srtt_us:6035 rtt_min:730
> > > > > > snd_ssthresh:2147483647
> > > > > > rcv_nxt:4218065811 snd_nxt:648617715 snd:una:648617715
> > > > > > mss_cache:32768
> > > > > > ecn_flags:0 rate_delivered:1 rate_interval_us:925
> > > > > > packets_out:0
> > > > > > retrans_out:0 total_retrans:0 segs_in:4 data_segs_in:2
> > > > > > segs_out:6
> > > > > > data_segs_out:0 lost_out:0 sacked_out:0 bytes_received:23
> > > > > > bytes_acked:2
> > > > > >      check_result:PASS:listen_sk 0 nsec
> > > > > >      check_result:PASS:srv_sk 0 nsec
> > > > > >      check_result:PASS:srv_tp 0 nsec
> > > > > > 
> > > > > > If bpf program is compiled with llvm16, the test passed
> > > > > > according
> > > > > > to
> > > > > > a CI run.
> > > > > > 
> > > > > > I don't have s390x environment to debug this. Could you
> > > > > > help
> > > > > > debug
> > > > > > it?
> > > > > > 
> > > > > > Thanks!
> > > > > > 
> > > > > >      [1]
> > > > > > https://github.com/kernel-patches/vmtest/actions/runs/4866851496/jobs/8679080985?pr=224#step:6:7645
> > > > > 
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > thank for letting me know.
> > > > > I will look into this.
> > > > > 
> > > > > Best regards,
> > > > > Ilya
> > > > 
> > > > In the meantime the issue was fixed by:
> > > > 
> > > > commit 141be5c062ecf22bd287afffd310e8ac4711444a
> > > > Author: Shoaib Meenai <smeenai@xxxxxx>
> > > > Date:   Fri May 5 14:18:12 2023 -0700
> > > > 
> > > >       Revert "Reland [Pipeline] Don't limit ArgumentPromotion
> > > > to -
> > > > O3"
> > > >       
> > > >       This reverts commit
> > > > 6f29d1adf29820daae9ea7a01ae2588b67735b9e.
> > > >       
> > > >       https://reviews.llvm.org/D149768  ; is causing size
> > > > regressions
> > > > for -
> > > > Oz
> > > >       with FullLTO, and I'm reverting that one while
> > > > investigating.
> > > > This
> > > >       commit depends on that one, so it needs to be reverted as
> > > > well.
> > > 
> > > The transformtion "Don't limit ArgumentPromotion to -O3" is
> > > temporarily
> > > reverted. But it could be reverted again once the issue is
> > > resolved.
> > > So it is a good idea to resolve the issue in the kernel.
> > > 
> > > > 
> > > > But looking at the codegen differences:
> > > > 
> > > > $ diff -u <(sed -e s/[0-9]*://g pass.s) <(sed -e s/[0-9]*://g
> > > > fail.s)
> > > > 
> > > > -pass.o:        file format elf64-bpf
> > > > +fail.o:        file format elf64-bpf
> > > > 
> > > > -00000000000002c8 <sk_dst_port__load_half>
> > > > -       69 11 00 30 00 00 00 00 r1 = *(u16 *)(r1 + 48)
> > > > +00000000000002c0 <sk_dst_port__load_half>
> > > > +       54 10 00 00 00 00 ff ff w1 &= 65535
> > > >           b4 00 00 00 00 00 00 01 w0 = 1
> > > >           16 10 00 01 00 00 ca fe if w1 == 51966 goto +1
> > > > <LBB6_2>
> > > >           b4 00 00 00 00 00 00 00 w0 = 0
> > > > 
> > > > This is what ArgumentPromotion is supposed to do, so that's
> > > > okay so
> > > > far. However, further down below we have:
> > > > 
> > > >    Disassembly of section cgroup_skb/egress:
> > > > 
> > > > -       bf 16 00 00 00 00 00 00 r1 = r6
> > > > +       61 76 00 30 00 00 00 00 r7 = *(u32 *)(r6 + 48)
> > > > +       bc 17 00 00 00 00 00 00 w1 = w7
> > > >           85 01 00 00 00 00 00 53 call sk_dst_port__load_word
> > > > 
> > > > ...
> > > > 
> > > > -       bf 16 00 00 00 00 00 00 r1 = r6
> > > > +       74 70 00 00 00 00 00 10 w7 >>= 16
> > > > +       bc 17 00 00 00 00 00 00 w1 = w7
> > > >           85 01 00 00 00 00 00 57 call sk_dst_port__load_half
> > > > 
> > > > so there is no 16-bit load anymore, instead, the result from
> > > > the
> > > > earlier 32-bit load is reused. However, on BE these kinds of
> > > > loads
> > > > for this particular field are not consistent at the moment -
> > > > see
> > > > [1]
> > > > and the previous discussions.
> > > > 
> > > > De-facto we have the following results:
> > > > 
> > > > - int load: 0x0000cafe
> > > > - short load: 0xcafe
> > > 
> > > So 'De-facto' means the above is the expected result.
> > > 
> > > > 
> > > > On a consistent BE we should have rather had:
> > > > 
> > > > - int load: 0x0000cafe
> > > > - short load: 0
> > > 
> > > What does 'consistent BE' mean here? Does it mean the expected
> > > result from the source code itself?
> > 
> > I should not have called the de-facto example "BE" at all: it's
> > rather
> > "mixed endianness" or "weird endianness" or something along these
> > lines.
> > 
> > On "consistent BE" or simply "BE" properties like
> > 
> > *(uint32_t *)p = (*(uint16_t *)p << 16) | *(uint16_t *)(p + 2);
> > 
> > hold. This is currently not the case for bpf_sock.dst_port.
> > 
> > We compile with -mbig-endian, so we promise to the compiler that
> > the
> > machine is big-endian, and the compiler expects the above to hold
> > for
> > any p. Unfortunately when p points to bpf_sock.dst_port, this is
> > not
> > the case.
> 
> If I understand correctly, *(uint32_t *)p to get the
> bpf_sock.dst_port
> is for backward compatibility. But the real u32 read by compiler will
> do (*(uint16_t *)p << 16) | *(uint16_t *)(p + 2) which is not the
> same as expected *(uint32_t *)p so we have problem here.
> 
> > 
> > The property above is important for the correctness of the
> > load/store
> > tearing transformations. What we have here is not exactly tearing,
> > but
> > is quite close.
> > 
> > > > Clang, of course, expects a consistent BE and optimizes around
> > > > that.
> > > > 
> > > > This was a conscious tradeoff Jakub and I have agreed on in
> > > > order
> > > > to
> > > > keep the quirky behavior from the past. Given what's happening
> > > > with
> > > > Clang now, I wonder if it would be worth revisiting it in the
> > > > name
> > > > of
> > > > consistency?
> > > 
> > > If I understand correctly, I think the issue is
> > >       r7 = *(u32 *)(r6 + 48)
> > >       w7 >= 16
> > >       w1 = w7
> > > 
> > > after verifier, it is changed to
> > >      r7 = *(u16 *)(r6 + <kernel offset>)
> > >      w7 >= 16
> > >      w1 = w7
> > > 
> > > and the result after verifier rewrite is completely wrong.
> > > Is it right?
> > 
> > No, the verifier rewrite is correct.
> > The sk_dst_port__load_word() part of the test succeeds.
> > 
> > All these rewrites already work fine, they are correct and
> > consistent.
> > It's really just bpf_sock.dst_port that is special.
> > 
> > > If this is the case, during verifier rewrite, if it is
> > > big endian, say the user intends to load 4 bytes (from uapi
> > > header)
> > > while the kernel field is 2 bytes, in such cases, kernel
> > > can still pretend to generate 4-byte load. For example,
> > > for the above example, the code after verification could be:
> > >      r7 = *(u16 *)(r6 + <kernel offset>)
> > >      r7 <= 16
> > >      w7 >= 16
> > >      w1 = w7
> > > 
> > > Hopefully, we won't have many such cases. Does this work?
> > 
> > This would break the sk_dst_port__load_word() part of the test.
> 
> This is a hack. This may work for this specific u16 case, but
> yes, it won't work for u32 load case.
> 
> > 
> > 
> > 
> > Above I asked whether we can resolve the inconsistency, but I
> > thought
> > about it and I don't see a way of doing it without breaking the
> > ABI,
> > which is at worst unacceptable, and at best a last resort measure.
> > 
> > What do you think about marking bpf_sock.dst_port volatile?
> > volatile
> > should prevent tearing and similar optimizations, with which we
> > have a
> > problem here.
> > 
> > We could also add a comment warning users not to cast away this
> > volatile due to the quirk we have. And then we should adjust the
> > test
> > (making all casts volatile) to comply with this new warning.
> 
> I did a little study on this. The main problem here for
> static __noinline bool sk_dst_port__load_half(struct bpf_sock *sk)
> {
>          __u16 *half = (__u16 *)&sk->dst_port;
>          return half[0] == bpf_htons(0xcafe);
> }
> 
> Through some cross-function optimization by ArgumentPromotion
> optimization, the compiler does:
>     /* the below shared by sk_dst_port__load_word
>      * and sk_dst_port__load_half
>      */
>     __u32 *word = (__u32 *)&sk->dst_port;
>     __u32 word_val = word[0];
> 
>     /* the below is for sk_dst_port__load_half only */
>     __u16 half_val = word_val >> 16;
> 
>     ... half_val passed into sk_dst_port__load_half ...
>     return half_val == bpf_htons(0xcafe);
> 
> Here, 'word_val = word[0]' is replaced by 2-byte load
> by the verifier and this caused the trouble for later
> sk_dst_port__load_half().
> 
> I don't have a good solution here. The issue is exposed
> as we have both u16 and u32 load for &sk->dst_port and
> the compiler did some optimization with this.
> 
> I would say this is an extreme corner case and we can just
> fix in the source code like below:
> 
> diff --git a/tools/testing/selftests/bpf/progs/test_sock_fields.c 
> b/tools/testing/selftests/bpf/progs/test_sock_fields.c
> index bbad3c2d9aa5..39c975786720 100644
> --- a/tools/testing/selftests/bpf/progs/test_sock_fields.c
> +++ b/tools/testing/selftests/bpf/progs/test_sock_fields.c
> @@ -265,7 +265,10 @@ static __noinline bool 
> sk_dst_port__load_word(struct bpf_sock *sk)
> 
>   static __noinline bool sk_dst_port__load_half(struct bpf_sock *sk)
>   {
> -       __u16 *half = (__u16 *)&sk->dst_port;
> +       __u16 *half;
> +
> +       asm volatile ("");
> +       half  = (__u16 *)&sk->dst_port;
>          return half[0] == bpf_htons(0xcafe);
>   }
> 
> Could you try whether the above workaround works or not?
> If we want the code to be future proof for potential
> cross-func optimization for these noinline functions, we
> can add similar asm codes to all of
> bool sk_dst_port__load_{word, half, byte}.

Hi,

this makes the issue go away, thanks.

However, I'm still concerned, because this only inhibits a certain
optimization and does not address the underlying fundamental problem:
we promise to clang that the in-kernel implementation of the eBPF
virtual machine is big-endian, while in reality it's not. As compiler
optimizations get more aggressive, we will surely see more of this.

Why not do this instead?

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1bb11a6ee667..3c9b535532ae 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6102,7 +6102,7 @@ struct bpf_sock {
 	__u32 src_ip4;
 	__u32 src_ip6[4];
 	__u32 src_port;		/* host byte order */
-	__be16 dst_port;	/* network byte order */
+	volatile __be16 dst_port;	/* network byte order */
 	__u16 :16;		/* zero padding */
 	__u32 dst_ip4;
 	__u32 dst_ip6[4];
diff --git a/tools/include/uapi/linux/bpf.h
b/tools/include/uapi/linux/bpf.h
index 1bb11a6ee667..3c9b535532ae 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6102,7 +6102,7 @@ struct bpf_sock {
 	__u32 src_ip4;
 	__u32 src_ip6[4];
 	__u32 src_port;		/* host byte order */
-	__be16 dst_port;	/* network byte order */
+	volatile __be16 dst_port;	/* network byte order */
 	__u16 :16;		/* zero padding */
 	__u32 dst_ip4;
 	__u32 dst_ip6[4];
diff --git a/tools/testing/selftests/bpf/progs/test_sock_fields.c
b/tools/testing/selftests/bpf/progs/test_sock_fields.c
index bbad3c2d9aa5..773ded84ac12 100644
--- a/tools/testing/selftests/bpf/progs/test_sock_fields.c
+++ b/tools/testing/selftests/bpf/progs/test_sock_fields.c
@@ -259,19 +259,19 @@ int ingress_read_sock_fields(struct __sk_buff
*skb)
  */
 static __noinline bool sk_dst_port__load_word(struct bpf_sock *sk)
 {
-	__u32 *word = (__u32 *)&sk->dst_port;
+	volatile __u32 *word = (volatile __u32 *)&sk->dst_port;
 	return word[0] == bpf_htons(0xcafe);
 }
 
 static __noinline bool sk_dst_port__load_half(struct bpf_sock *sk)
 {
-	__u16 *half = (__u16 *)&sk->dst_port;
+	volatile __u16 *half = (volatile __u16 *)&sk->dst_port;
 	return half[0] == bpf_htons(0xcafe);
 }
 
 static __noinline bool sk_dst_port__load_byte(struct bpf_sock *sk)
 {
-	__u8 *byte = (__u8 *)&sk->dst_port;
+	volatile __u8 *byte = (volatile __u8 *)&sk->dst_port;
 	return byte[0] == 0xca && byte[1] == 0xfe;
 }
 
This also works, and as far as I'm concerned, this would be a proper
fix for the underlying issue: we tell the compiler that it should never
ever (with any of today's or future optimizations) try to be clever
when accessing dst_port.

Best regards,
Ilya


> > > > [1]
> > > > https://lore.kernel.org/bpf/20220317113920.1068535-5-jakub@xxxxxxxxxxxxxx






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux