Re: selftest sock_fields failed on s390x with latest llvm17

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2023-05-03 at 21:46 +0200, Ilya Leoshkevich wrote:
> On Wed, 2023-05-03 at 12:35 -0700, Yonghong Song wrote:
> > Hi, Ilya,
> > 
> > BPF CI ([1]) detected a s390x failure when bpf program is compiled
> > with 
> > latest llvm17 on bpf-next branch. To reproduce the issue, just run 
> > normal './test_progs -j'. The failure log looks like below:
> > 
> > Notice: Success: 341/3015, Skipped: 29, Failed: 1
> > Error: #191 sock_fields
> >    Error: #191 sock_fields
> >    create_netns:PASS:create netns 0 nsec
> >    create_netns:PASS:bring up lo 0 nsec
> >    serial_test_sock_fields:PASS:test_sock_fields__open_and_load 0
> > nsec
> >   
> > serial_test_sock_fields:PASS:attach_cgroup(egress_read_sock_fields)
> > 0
> > nsec
> >   
> > serial_test_sock_fields:PASS:attach_cgroup(ingress_read_sock_fields
> > ) 
> > 0 nsec
> >    serial_test_sock_fields:PASS:attach_cgroup(read_sk_dst_port 0
> > nsec
> >    test:PASS:getsockname(listen_fd) 0 nsec
> >    test:PASS:getsockname(cli_fd) 0 nsec
> >    test:PASS:accept(listen_fd) 0 nsec
> >    init_sk_storage:PASS:bpf_map_update_elem(sk_pkt_out_cnt_fd) 0
> > nsec
> >    init_sk_storage:PASS:bpf_map_update_elem(sk_pkt_out_cnt10_fd) 0
> > nsec
> >    test:PASS:send(accept_fd) 0 nsec
> >    test:PASS:recv(cli_fd) 0 nsec
> >    test:PASS:send(accept_fd) 0 nsec
> >    test:PASS:recv(cli_fd) 0 nsec
> >    test:PASS:recv(accept_fd) for fin 0 nsec
> >    test:PASS:recv(cli_fd) for fin 0 nsec
> >    check_sk_pkt_out_cnt:PASS:bpf_map_lookup_elem(sk_pkt_out_cnt, 
> > &accept_fd) 0 nsec
> >    check_sk_pkt_out_cnt:PASS:bpf_map_lookup_elem(sk_pkt_out_cnt, 
> > &cli_fd) 0 nsec
> >    check_result:PASS:bpf_map_lookup_elem(linum_map_fd) 0 nsec
> >    check_result:PASS:bpf_map_lookup_elem(linum_map_fd) 0 nsec
> >    check_result:PASS:bpf_map_lookup_elem(linum_map_fd, 
> > READ_SK_DST_PORT_IDX) 0 nsec
> >    check_result:FAIL:failure in read_sk_dst_port on line unexpected
> > failure in read_sk_dst_port on line: actual 297 != expected 0
> >    listen_sk: state:10 bound_dev_if:0 family:10 type:1 protocol:6
> > mark:0 
> > priority:0 src_ip4:7f000006(127.0.0.6) src_ip6:0:0:0:1(::1) 
> > src_port:51966 dst_ip4:0(0.0.0.0) dst_ip6:0:0:0:0(::) dst_port:0
> >    srv_sk: state:9 bound_dev_if:0 family:10 type:1 protocol:6
> > mark:0 
> > priority:0 src_ip4:7f000006(127.0.0.6) src_ip6:0:0:0:1(::1) 
> > src_port:51966 dst_ip4:7f000006(127.0.0.6) dst_ip6:0:0:0:1(::1) 
> > dst_port:38030
> >    cli_sk: state:5 bound_dev_if:0 family:10 type:1 protocol:6
> > mark:0 
> > priority:0 src_ip4:7f000006(127.0.0.6) src_ip6:0:0:0:1(::1) 
> > src_port:38030 dst_ip4:0(0.0.0.0) dst_ip6:0:0:0:1(::1)
> > dst_port:51966
> >    listen_tp: snd_cwnd:10 srtt_us:0 rtt_min:4294967295 
> > snd_ssthresh:2147483647 rcv_nxt:0 snd_nxt:0 snd:una:0 mss_cache:536
> > ecn_flags:0 rate_delivered:0 rate_interval_us:0 packets_out:0 
> > retrans_out:0 total_retrans:0 segs_in:0 data_segs_in:0 segs_out:0 
> > data_segs_out:0 lost_out:0 sacked_out:0 bytes_received:0
> > bytes_acked:0
> >    srv_tp: snd_cwnd:10 srtt_us:3904 rtt_min:272
> > snd_ssthresh:2147483647 
> > rcv_nxt:648617715 snd_nxt:4218065810 snd:una:4218065810
> > mss_cache:32768 
> > ecn_flags:0 rate_delivered:1 rate_interval_us:272 packets_out:0 
> > retrans_out:0 total_retrans:0 segs_in:5 data_segs_in:0 segs_out:3 
> > data_segs_out:2 lost_out:0 sacked_out:0 bytes_received:1
> > bytes_acked:22
> >    cli_tp: snd_cwnd:10 srtt_us:6035 rtt_min:730
> > snd_ssthresh:2147483647 
> > rcv_nxt:4218065811 snd_nxt:648617715 snd:una:648617715
> > mss_cache:32768 
> > ecn_flags:0 rate_delivered:1 rate_interval_us:925 packets_out:0 
> > retrans_out:0 total_retrans:0 segs_in:4 data_segs_in:2 segs_out:6 
> > data_segs_out:0 lost_out:0 sacked_out:0 bytes_received:23
> > bytes_acked:2
> >    check_result:PASS:listen_sk 0 nsec
> >    check_result:PASS:srv_sk 0 nsec
> >    check_result:PASS:srv_tp 0 nsec
> > 
> > If bpf program is compiled with llvm16, the test passed according
> > to
> > a CI run.
> > 
> > I don't have s390x environment to debug this. Could you help debug
> > it?
> > 
> > Thanks!
> > 
> >    [1] 
> > https://github.com/kernel-patches/vmtest/actions/runs/4866851496/jobs/8679080985?pr=224#step:6:7645
> 
> 
> Hi,
> 
> thank for letting me know.
> I will look into this.
> 
> Best regards,
> Ilya

In the meantime the issue was fixed by:

commit 141be5c062ecf22bd287afffd310e8ac4711444a
Author: Shoaib Meenai <smeenai@xxxxxx>
Date:   Fri May 5 14:18:12 2023 -0700

    Revert "Reland [Pipeline] Don't limit ArgumentPromotion to -O3"
    
    This reverts commit 6f29d1adf29820daae9ea7a01ae2588b67735b9e.
    
    https://reviews.llvm.org/D149768 is causing size regressions for -
Oz
    with FullLTO, and I'm reverting that one while investigating. This
    commit depends on that one, so it needs to be reverted as well.

But looking at the codegen differences:

$ diff -u <(sed -e s/[0-9]*://g pass.s) <(sed -e s/[0-9]*://g fail.s)

-pass.o:        file format elf64-bpf
+fail.o:        file format elf64-bpf

-00000000000002c8 <sk_dst_port__load_half>
-       69 11 00 30 00 00 00 00 r1 = *(u16 *)(r1 + 48)
+00000000000002c0 <sk_dst_port__load_half>
+       54 10 00 00 00 00 ff ff w1 &= 65535
        b4 00 00 00 00 00 00 01 w0 = 1
        16 10 00 01 00 00 ca fe if w1 == 51966 goto +1 <LBB6_2>
        b4 00 00 00 00 00 00 00 w0 = 0

This is what ArgumentPromotion is supposed to do, so that's okay so
far. However, further down below we have:

 Disassembly of section cgroup_skb/egress:

-       bf 16 00 00 00 00 00 00 r1 = r6
+       61 76 00 30 00 00 00 00 r7 = *(u32 *)(r6 + 48)
+       bc 17 00 00 00 00 00 00 w1 = w7
        85 01 00 00 00 00 00 53 call sk_dst_port__load_word

...

-       bf 16 00 00 00 00 00 00 r1 = r6
+       74 70 00 00 00 00 00 10 w7 >>= 16
+       bc 17 00 00 00 00 00 00 w1 = w7
        85 01 00 00 00 00 00 57 call sk_dst_port__load_half

so there is no 16-bit load anymore, instead, the result from the
earlier 32-bit load is reused. However, on BE these kinds of loads
for this particular field are not consistent at the moment - see [1]
and the previous discussions.

De-facto we have the following results:

- int load: 0x0000cafe
- short load: 0xcafe

On a consistent BE we should have rather had:

- int load: 0x0000cafe
- short load: 0

Clang, of course, expects a consistent BE and optimizes around that.

This was a conscious tradeoff Jakub and I have agreed on in order to
keep the quirky behavior from the past. Given what's happening with
Clang now, I wonder if it would be worth revisiting it in the name of
consistency?

[1]
https://lore.kernel.org/bpf/20220317113920.1068535-5-jakub@xxxxxxxxxxxxxx





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux