On 5/12/23 7:40 AM, Ilya Leoshkevich wrote:
On Wed, 2023-05-03 at 21:46 +0200, Ilya Leoshkevich wrote:
On Wed, 2023-05-03 at 12:35 -0700, Yonghong Song wrote:
Hi, Ilya,
BPF CI ([1]) detected a s390x failure when bpf program is compiled
with
latest llvm17 on bpf-next branch. To reproduce the issue, just run
normal './test_progs -j'. The failure log looks like below:
Notice: Success: 341/3015, Skipped: 29, Failed: 1
Error: #191 sock_fields
Error: #191 sock_fields
create_netns:PASS:create netns 0 nsec
create_netns:PASS:bring up lo 0 nsec
serial_test_sock_fields:PASS:test_sock_fields__open_and_load 0
nsec
serial_test_sock_fields:PASS:attach_cgroup(egress_read_sock_fields)
0
nsec
serial_test_sock_fields:PASS:attach_cgroup(ingress_read_sock_fields
)
0 nsec
serial_test_sock_fields:PASS:attach_cgroup(read_sk_dst_port 0
nsec
test:PASS:getsockname(listen_fd) 0 nsec
test:PASS:getsockname(cli_fd) 0 nsec
test:PASS:accept(listen_fd) 0 nsec
init_sk_storage:PASS:bpf_map_update_elem(sk_pkt_out_cnt_fd) 0
nsec
init_sk_storage:PASS:bpf_map_update_elem(sk_pkt_out_cnt10_fd) 0
nsec
test:PASS:send(accept_fd) 0 nsec
test:PASS:recv(cli_fd) 0 nsec
test:PASS:send(accept_fd) 0 nsec
test:PASS:recv(cli_fd) 0 nsec
test:PASS:recv(accept_fd) for fin 0 nsec
test:PASS:recv(cli_fd) for fin 0 nsec
check_sk_pkt_out_cnt:PASS:bpf_map_lookup_elem(sk_pkt_out_cnt,
&accept_fd) 0 nsec
check_sk_pkt_out_cnt:PASS:bpf_map_lookup_elem(sk_pkt_out_cnt,
&cli_fd) 0 nsec
check_result:PASS:bpf_map_lookup_elem(linum_map_fd) 0 nsec
check_result:PASS:bpf_map_lookup_elem(linum_map_fd) 0 nsec
check_result:PASS:bpf_map_lookup_elem(linum_map_fd,
READ_SK_DST_PORT_IDX) 0 nsec
check_result:FAIL:failure in read_sk_dst_port on line unexpected
failure in read_sk_dst_port on line: actual 297 != expected 0
listen_sk: state:10 bound_dev_if:0 family:10 type:1 protocol:6
mark:0
priority:0 src_ip4:7f000006(127.0.0.6) src_ip6:0:0:0:1(::1)
src_port:51966 dst_ip4:0(0.0.0.0) dst_ip6:0:0:0:0(::) dst_port:0
srv_sk: state:9 bound_dev_if:0 family:10 type:1 protocol:6
mark:0
priority:0 src_ip4:7f000006(127.0.0.6) src_ip6:0:0:0:1(::1)
src_port:51966 dst_ip4:7f000006(127.0.0.6) dst_ip6:0:0:0:1(::1)
dst_port:38030
cli_sk: state:5 bound_dev_if:0 family:10 type:1 protocol:6
mark:0
priority:0 src_ip4:7f000006(127.0.0.6) src_ip6:0:0:0:1(::1)
src_port:38030 dst_ip4:0(0.0.0.0) dst_ip6:0:0:0:1(::1)
dst_port:51966
listen_tp: snd_cwnd:10 srtt_us:0 rtt_min:4294967295
snd_ssthresh:2147483647 rcv_nxt:0 snd_nxt:0 snd:una:0 mss_cache:536
ecn_flags:0 rate_delivered:0 rate_interval_us:0 packets_out:0
retrans_out:0 total_retrans:0 segs_in:0 data_segs_in:0 segs_out:0
data_segs_out:0 lost_out:0 sacked_out:0 bytes_received:0
bytes_acked:0
srv_tp: snd_cwnd:10 srtt_us:3904 rtt_min:272
snd_ssthresh:2147483647
rcv_nxt:648617715 snd_nxt:4218065810 snd:una:4218065810
mss_cache:32768
ecn_flags:0 rate_delivered:1 rate_interval_us:272 packets_out:0
retrans_out:0 total_retrans:0 segs_in:5 data_segs_in:0 segs_out:3
data_segs_out:2 lost_out:0 sacked_out:0 bytes_received:1
bytes_acked:22
cli_tp: snd_cwnd:10 srtt_us:6035 rtt_min:730
snd_ssthresh:2147483647
rcv_nxt:4218065811 snd_nxt:648617715 snd:una:648617715
mss_cache:32768
ecn_flags:0 rate_delivered:1 rate_interval_us:925 packets_out:0
retrans_out:0 total_retrans:0 segs_in:4 data_segs_in:2 segs_out:6
data_segs_out:0 lost_out:0 sacked_out:0 bytes_received:23
bytes_acked:2
check_result:PASS:listen_sk 0 nsec
check_result:PASS:srv_sk 0 nsec
check_result:PASS:srv_tp 0 nsec
If bpf program is compiled with llvm16, the test passed according
to
a CI run.
I don't have s390x environment to debug this. Could you help debug
it?
Thanks!
[1]
https://github.com/kernel-patches/vmtest/actions/runs/4866851496/jobs/8679080985?pr=224#step:6:7645
Hi,
thank for letting me know.
I will look into this.
Best regards,
Ilya
In the meantime the issue was fixed by:
commit 141be5c062ecf22bd287afffd310e8ac4711444a
Author: Shoaib Meenai <smeenai@xxxxxx>
Date: Fri May 5 14:18:12 2023 -0700
Revert "Reland [Pipeline] Don't limit ArgumentPromotion to -O3"
This reverts commit 6f29d1adf29820daae9ea7a01ae2588b67735b9e.
https://reviews.llvm.org/D149768 is causing size regressions for -
Oz
with FullLTO, and I'm reverting that one while investigating. This
commit depends on that one, so it needs to be reverted as well.
The transformtion "Don't limit ArgumentPromotion to -O3" is temporarily
reverted. But it could be reverted again once the issue is resolved.
So it is a good idea to resolve the issue in the kernel.
But looking at the codegen differences:
$ diff -u <(sed -e s/[0-9]*://g pass.s) <(sed -e s/[0-9]*://g fail.s)
-pass.o: file format elf64-bpf
+fail.o: file format elf64-bpf
-00000000000002c8 <sk_dst_port__load_half>
- 69 11 00 30 00 00 00 00 r1 = *(u16 *)(r1 + 48)
+00000000000002c0 <sk_dst_port__load_half>
+ 54 10 00 00 00 00 ff ff w1 &= 65535
b4 00 00 00 00 00 00 01 w0 = 1
16 10 00 01 00 00 ca fe if w1 == 51966 goto +1 <LBB6_2>
b4 00 00 00 00 00 00 00 w0 = 0
This is what ArgumentPromotion is supposed to do, so that's okay so
far. However, further down below we have:
Disassembly of section cgroup_skb/egress:
- bf 16 00 00 00 00 00 00 r1 = r6
+ 61 76 00 30 00 00 00 00 r7 = *(u32 *)(r6 + 48)
+ bc 17 00 00 00 00 00 00 w1 = w7
85 01 00 00 00 00 00 53 call sk_dst_port__load_word
...
- bf 16 00 00 00 00 00 00 r1 = r6
+ 74 70 00 00 00 00 00 10 w7 >>= 16
+ bc 17 00 00 00 00 00 00 w1 = w7
85 01 00 00 00 00 00 57 call sk_dst_port__load_half
so there is no 16-bit load anymore, instead, the result from the
earlier 32-bit load is reused. However, on BE these kinds of loads
for this particular field are not consistent at the moment - see [1]
and the previous discussions.
De-facto we have the following results:
- int load: 0x0000cafe
- short load: 0xcafe
So 'De-facto' means the above is the expected result.
On a consistent BE we should have rather had:
- int load: 0x0000cafe
- short load: 0
What does 'consistent BE' mean here? Does it mean the expected
result from the source code itself?
Clang, of course, expects a consistent BE and optimizes around that.
This was a conscious tradeoff Jakub and I have agreed on in order to
keep the quirky behavior from the past. Given what's happening with
Clang now, I wonder if it would be worth revisiting it in the name of
consistency?
If I understand correctly, I think the issue is
r7 = *(u32 *)(r6 + 48)
w7 >= 16
w1 = w7
after verifier, it is changed to
r7 = *(u16 *)(r6 + <kernel offset>)
w7 >= 16
w1 = w7
and the result after verifier rewrite is completely wrong.
Is it right?
If this is the case, during verifier rewrite, if it is
big endian, say the user intends to load 4 bytes (from uapi header)
while the kernel field is 2 bytes, in such cases, kernel
can still pretend to generate 4-byte load. For example,
for the above example, the code after verification could be:
r7 = *(u16 *)(r6 + <kernel offset>)
r7 <= 16
w7 >= 16
w1 = w7
Hopefully, we won't have many such cases. Does this work?
[1]
https://lore.kernel.org/bpf/20220317113920.1068535-5-jakub@xxxxxxxxxxxxxx