On Sat, Feb 24, 2024 at 08:07:09AM +0000, Christophe Leroy wrote: > Hello, > > I'm seeking your help with an issue reported by BPF CI tests on a core > BPF patch I provided to improve security in link with > https://github.com/KSPP/linux/issues/7 > > I submitted patch > https://patchwork.kernel.org/project/netdevbpf/patch/135feeafe6fe8d412e90865622e9601403c42be5.1708253445.git.christophe.leroy@xxxxxxxxxx/ > > As you can see in the checks list, I get "bpf/vmtest-bpf-next-VM_Test-14 > fail Logs for s390x-gcc / test (test_progs, false, 360) / test_progs > on s390x with gcc " > > The output is the one below. > > Could you help me understand and fix the issue on S390 ? > > Thanks > Christophe > > Output: > > ... > #262 reg_bounds_rand_ranges_u64_u64:OK > #263 resolve_btfids:OK > Caught signal #11! > Stack trace: > ./test_progs(crash_handler+0x40)[0x2aa090c5ca8] > linux-vdso64.so.1(__kernel_sigreturn+0x0)[0x3ffc78ce488] > ./test_progs(ring_buffer__poll+0xc6)[0x2aa0912bbe6] > ./test_progs(+0x283490)[0x2aa08f83490] > /lib/s390x-linux-gnu/libpthread.so.0(+0x7e66)[0x3ffb8c07e66] > /lib/s390x-linux-gnu/libc.so.6(+0xfcd46)[0x3ffb8afcd46] > [0x0] > > test_progs[116] is installing a program with bpf_probe_write_user > helper that may corrupt user memory! > > User process fault: interruption code 003b ilc:2 in > test_progs[2aa08d00000+b1f000] > > Failing address: 0000000000000000 TEID: 0000000000000800 > > Fault in primary space mode while using user ASCE. > > AS:0000000081b381cf R1:0000000081b2c00f R2:0000000081bf400b > R3:0000000000000024 > > CPU: 0 PID: 804 Comm: new_name Tainted: G OE > 6.8.0-rc1-g690b912d8bb7-dirty #215 > > Hardware name: IBM 8561 LT1 400 (KVM/Linux) > > User PSW : 0705000180000000 000002aa0912bbe6 > > R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:0 PM:0 RI:0 EA:3 > > User GPRS: 0000000000000000 0000000000000000 0000000000000000 > 000002aa0dbcf5c0 > > ffffffff00000000 0000000000002710 000003ffb6d00900 > 000003ffc7877ad7 > > 000003ffb6d001e0 000003ffc7877ad6 000003ffc7877ad8 > 000003ffb6cffe50 > > 000003ffb8e26f88 000003ffb6d00900 000002aa0912bb88 > 000003ffb6cffe50 > > User Code: 000002aa0912bbd6: e310b0b40014 lgf %r1,180(%r11) > 000002aa0912bbdc: eb110004000d sllg %r1,%r1,4 > #000002aa0912bbe2: b9080012 agr %r1,%r2 > >000002aa0912bbe6: 58101008 l %r1,8(%r1) > 000002aa0912bbea: 5010b0bc st %r1,188(%r11) > 000002aa0912bbee: e310b0a80004 lg %r1,168(%r11) > 000002aa0912bbf4: e32010080004 lg %r2,8(%r1) > 000002aa0912bbfa: e310b0bc0016 llgf %r1,188(%r11) > > Last Breaking-Event-Address: > > [<0000000000000001>] test_progs[2aa08d00000+b1f000] > ./ci/vmtest/vmtest_selftests.sh: line 69: 116 Segmentation fault > ./${selftest} ${args} ${DENYLIST:+-d"$DENYLIST"} > ${ALLOWLIST:+-a"$ALLOWLIST"} --json-summary "${json_file}" > bash: line 5: cd: /tmp/work/bpf/bpf: No such file or directory Hi, I think this is an intermittent failure that has nothing to do with your patch. I could not reproduce it after a few hundred iterations of the ringbuf test. However, I have taken a look at the code and I think I know what is happening here. The test starts poll_thread(), and later calls only pthread_tryjoin_np(). When a test machine is overloaded, the thread startup may be arbitrarily delayed, and it looks as if in this case it's really started only after ring_buffer__free(). So I would suggest replacing the final pthread_tryjoin_np() with pthread_join(). Best regards, Ilya