Re: Help needed for a BPF kernel issue with S390

Ilya Leoshkevich <iii@xxxxxxxxxxxxx> · Sat, 24 Feb 2024 12:31:05 +0100

On Sat, Feb 24, 2024 at 08:07:09AM +0000, Christophe Leroy wrote:
> Hello,
> 
> I'm seeking your help with an issue reported by BPF CI tests on a core 
> BPF patch I provided to improve security in link with 
> https://github.com/KSPP/linux/issues/7
> 
> I submitted patch 
> https://patchwork.kernel.org/project/netdevbpf/patch/135feeafe6fe8d412e90865622e9601403c42be5.1708253445.git.christophe.leroy@xxxxxxxxxx/
> 
> As you can see in the checks list, I get "bpf/vmtest-bpf-next-VM_Test-14 
> 	fail 	Logs for s390x-gcc / test (test_progs, false, 360) / test_progs 
> on s390x with gcc "
> 
> The output is the one below.
> 
> Could you help me understand and fix the issue on S390 ?
> 
> Thanks
> Christophe
> 
> Output:
> 
> ...
>    #262     reg_bounds_rand_ranges_u64_u64:OK
>    #263     resolve_btfids:OK
>    Caught signal #11!
>    Stack trace:
>    ./test_progs(crash_handler+0x40)[0x2aa090c5ca8]
>    linux-vdso64.so.1(__kernel_sigreturn+0x0)[0x3ffc78ce488]
>    ./test_progs(ring_buffer__poll+0xc6)[0x2aa0912bbe6]
>    ./test_progs(+0x283490)[0x2aa08f83490]
>    /lib/s390x-linux-gnu/libpthread.so.0(+0x7e66)[0x3ffb8c07e66]
>    /lib/s390x-linux-gnu/libc.so.6(+0xfcd46)[0x3ffb8afcd46]
>    [0x0]
> 
>    test_progs[116] is installing a program with bpf_probe_write_user 
> helper that may corrupt user memory!
> 
>    User process fault: interruption code 003b ilc:2 in 
> test_progs[2aa08d00000+b1f000]
> 
>    Failing address: 0000000000000000 TEID: 0000000000000800
> 
>    Fault in primary space mode while using user ASCE.
> 
>    AS:0000000081b381cf R1:0000000081b2c00f R2:0000000081bf400b 
> R3:0000000000000024
> 
>    CPU: 0 PID: 804 Comm: new_name Tainted: G           OE 
> 6.8.0-rc1-g690b912d8bb7-dirty #215
> 
>    Hardware name: IBM 8561 LT1 400 (KVM/Linux)
> 
>    User PSW : 0705000180000000 000002aa0912bbe6
> 
>               R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:0 PM:0 RI:0 EA:3
> 
>    User GPRS: 0000000000000000 0000000000000000 0000000000000000 
> 000002aa0dbcf5c0
> 
>               ffffffff00000000 0000000000002710 000003ffb6d00900 
> 000003ffc7877ad7
> 
>               000003ffb6d001e0 000003ffc7877ad6 000003ffc7877ad8 
> 000003ffb6cffe50
> 
>               000003ffb8e26f88 000003ffb6d00900 000002aa0912bb88 
> 000003ffb6cffe50
> 
>    User Code: 000002aa0912bbd6: e310b0b40014	lgf	%r1,180(%r11)
>               000002aa0912bbdc: eb110004000d	sllg	%r1,%r1,4
>              #000002aa0912bbe2: b9080012		agr	%r1,%r2
>              >000002aa0912bbe6: 58101008		l	%r1,8(%r1)
>               000002aa0912bbea: 5010b0bc		st	%r1,188(%r11)
>               000002aa0912bbee: e310b0a80004	lg	%r1,168(%r11)
>               000002aa0912bbf4: e32010080004	lg	%r2,8(%r1)
>               000002aa0912bbfa: e310b0bc0016	llgf	%r1,188(%r11)
> 
>    Last Breaking-Event-Address:
> 
>     [<0000000000000001>] test_progs[2aa08d00000+b1f000]
>    ./ci/vmtest/vmtest_selftests.sh: line 69:   116 Segmentation fault 
>    ./${selftest} ${args} ${DENYLIST:+-d"$DENYLIST"} 
> ${ALLOWLIST:+-a"$ALLOWLIST"} --json-summary "${json_file}"
> bash: line 5: cd: /tmp/work/bpf/bpf: No such file or directory

Hi,

I think this is an intermittent failure that has nothing to do with
your patch. I could not reproduce it after a few hundred iterations of
the ringbuf test.

However, I have taken a look at the code and I think I know what is
happening here. The test starts poll_thread(), and later calls only
pthread_tryjoin_np(). When a test machine is overloaded, the thread
startup may be arbitrarily delayed, and it looks as if in this case
it's really started only after ring_buffer__free().

So I would suggest replacing the final pthread_tryjoin_np() with
pthread_join().

Best regards,
Ilya