On 2024-12-20 15:56, John Fastabend wrote:
Björn Töpel wrote:Björn Töpel <bjorn@xxxxxxxxxx> writes:Levi Zim <rsworktech@xxxxxxxxxxx> writes:On 2024-12-04 09:01, Cong Wang wrote:On Sun, Dec 01, 2024 at 09:42:08AM +0800, Levi Zim wrote:On 2024-11-30 21:38, Levi Zim via B4 Relay wrote:I found that bpf kselftest sockhash::test_txmsg_cork_hangs in test_sockmap.c triggers a kernel NULL pointer dereference:Interesting, I also ran this test recently and I didn't see such a crash.I am also curious about why other people or the CI didn't hit such crash.FWIW, I'm hitting it on RISC-V: | Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000000008 | Oops [#1] | Modules linked in: sch_fq_codel drm fuse drm_panel_orientation_quirks backlight | CPU: 7 UID: 0 PID: 732 Comm: test_sockmap Not tainted 6.13.0-rc3-00017-gf44d154d6e3d #1 | Hardware name: riscv-virtio qemu/qemu, BIOS 2025.01-rc3-00042-gacab6e78aca7 01/01/2025 | epc : splice_to_socket+0x376/0x49a | ra : splice_to_socket+0x37c/0x49a | epc : ffffffff803d9ffc ra : ffffffff803da002 sp : ff20000001c3b8b0 | gp : ffffffff827aefa8 tp : ff60000083450040 t0 : ff6000008a12d001 | t1 : 0000100100001001 t2 : 0000000000000000 s0 : ff20000001c3bae0 | s1 : ffffffffffffefff a0 : ff6000008245e200 a1 : ff60000087dd0450 | a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 | a5 : 0000000000000000 a6 : ff20000001c3b450 a7 : ff6000008a12c004 | s2 : 000000000000000f s3 : ff6000008245e2d0 s4 : ff6000008245e280 | s5 : 0000000000000000 s6 : 0000000000000002 s7 : 0000000000001001 | s8 : 0000000000003001 s9 : 0000000000000002 s10: 0000000000000002 | s11: ff6000008245e200 t3 : ffffffff8001e78c t4 : 0000000000000000 | t5 : 0000000000000000 t6 : ff6000008869f230 | status: 0000000200000120 badaddr: 0000000000000008 cause: 000000000000000d | [<ffffffff803d9ffc>] splice_to_socket+0x376/0x49a | [<ffffffff803d8bc0>] direct_splice_actor+0x44/0x216 | [<ffffffff803d8532>] splice_direct_to_actor+0xb6/0x1e8 | [<ffffffff803d8780>] do_splice_direct+0x70/0xa2 | [<ffffffff80392e40>] do_sendfile+0x26e/0x2d4 | [<ffffffff803939d4>] __riscv_sys_sendfile64+0xf2/0x10e | [<ffffffff80fdfb64>] do_trap_ecall_u+0x1f8/0x26c | [<ffffffff80fedaee>] _new_vmalloc_restore_context_a0+0xc6/0xd2 | Code: c5d8 9e35 c590 8bb3 40db eb01 6998 b823 0005 856e (6718) 2d05 | ---[ end trace 0000000000000000 ]--- | Kernel panic - not syncing: Fatal exception | SMP: stopping secondary CPUs | ---[ end Kernel panic - not syncing: Fatal exception ]--- This is commit f44d154d6e3d ("Merge tag 'soc-fixes-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc"). (Yet to bisect!)Took the series for a run, and it does solve crash, but I'm getting additional failures:Hi Bjorn, Thanks! I'm guessing those tests were failing even without the patch though right?
IIRC those kTLS tests were failing when I manually commented out the cork hangs test that crashes the kernel.
Thanks, John| [TEST 298]: (512, 1, 3, sendpage, pass,pop (1,3),ktls,): socket(peer2) kTLS enabled | socket(client1) kTLS enabled | recv failed(): Invalid argument | rx thread exited with err 1. | FAILED | [TEST 299]: (100, 1, 5, sendpage, pass,pop (1,3),ktls,): socket(peer2) kTLS enabled | socket(client1) kTLS enabled | recv failed(): Invalid argument | rx thread exited with err 1. | FAILED | [TEST 300]: (2, 32, 8192, sendpage, pass,pop (4096,8192),ktls,): socket(peer2) kTLS enabled | socket(client1) kTLS enabled | recv failed(): Bad message | rx thread exited with err 1. | FAILED | ... | #42/ 9 sockhash:ktls:txmsg test pop-data:FAIL | ... | [TEST 308]: (2, 32, 8192, sendpage, pass,pop (5,21),ktls,): socket(peer2) kTLS enabled | socket(client1) kTLS enabled | recv failed(): Bad message | rx thread exited with err 1. | FAILED | [TEST 309]: (2, 32, 8192, sendpage, pass,pop (1,11),ktls,): socket(peer2) kTLS enabled | socket(client1) kTLS enabled | recv failed(): Bad message | rx thread exited with err 1. | FAILED | ... | #43/ 6 sockhash:ktls:txmsg test push/pop data:FAIL