On Tue, Feb 13, 2024 at 12:34 AM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > On Tue, 13 Feb 2024 at 01:21, Yan Zhai <yan@xxxxxxxxxxxxxx> wrote: > > > > On Mon, Feb 12, 2024 at 5:52 PM Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > On Mon, Feb 12, 2024 at 3:42 PM Kumar Kartikeya Dwivedi > > > <memxor@xxxxxxxxx> wrote: > > > > > > > > On Tue, 13 Feb 2024 at 00:34, Alexei Starovoitov > > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > > > On Mon, Feb 12, 2024 at 3:16 PM Ignat Korchagin <ignat@xxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > [288931.217143][T109754] CPU: 4 PID: 109754 Comm: bpftrace Not tainted > > > > > > 6.6.16+ #10 > > > > > > > > > > ... > > > > > > [288931.217143][T109754] ? copy_from_kernel_nofault+0x1d/0xe0 > > > > > > [288931.217143][T109754] bpf_probe_read_compat+0x6a/0x90 > > > > > > > > > > > > And Jakub CCed here did it for 6.8.0-rc2+ > > > > > > > > > > I suspect something is broken in your kernels. > > > > > Above is doing generic copy_from_kernel_nofault(), > > > > > so one should be able to crash the kernel without any bpf. > > > > > > > > > > We have this in selftests/bpf: > > > > > __weak noinline struct file *bpf_testmod_return_ptr(int arg) > > > > > { > > > > > static struct file f = {}; > > > > > > > > > > switch (arg) { > > > > > case 1: return (void *)EINVAL; /* user addr */ > > > > > case 2: return (void *)0xcafe4a11; /* user addr */ > > > > > case 3: return (void *)-EINVAL; /* canonical, but invalid */ > > > > > case 4: return (void *)(1ull << 60); /* non-canonical and invalid */ > > > > > case 5: return (void *)~(1ull << 30); /* trigger extable */ > > > > > case 6: return &f; /* valid addr */ > > > > > case 7: return (void *)((long)&f | 1); /* kernel tricks */ > > > > > default: return NULL; > > > > > } > > > > > } > > > > > where we check that extables setup by JIT for bpf progs are working correctly. > > > > > You should see the kernel crashing when you just run bpf selftests. > > > > > > > > I agree, this appears unrelated to BPF since it is happening when > > > > using copy_from_kernel_nofault (which should be jumping to the Efault > > > > label instead of the oops), but I think it's not specific to some > > > > custom kernel. I can reproduce it on my dev machine on top of bpf-next > > > > as well, and another machine with Ubuntu's generic 6.5 kernel for > > > > 24.04. And I think Ignat tried it on the mainline 6.8-rc2 as well. > > > > > copy_from_kernel_nofault is called in Jakub's reproducer, but the > > panic case in our production seems to be direct memory accessing > > according to bpftool dumped jited code. Will faults from such > > instructions also be caught correctly? > > > > Yep, since faults in both cases end up in the page fault handler. > Once the fix pointed out by Alexei is applied, it should address both scenarios. Just as a follow up the patches do seem to help for x86, but we've recently encountered a similar problem on arm64 (6.1.74 kernel): [Wed Feb 21 12:06:33 2024] Unable to handle kernel access to user memory outside uaccess routines at virtual address 00007fff9959b150 [Wed Feb 21 12:06:33 2024] Mem abort info: [Wed Feb 21 12:06:33 2024] ESR = 0x000000009600000f [Wed Feb 21 12:06:33 2024] EC = 0x25: DABT (current EL), IL = 32 bits [Wed Feb 21 12:06:33 2024] SET = 0, FnV = 0 [Wed Feb 21 12:06:33 2024] EA = 0, S1PTW = 0 [Wed Feb 21 12:06:33 2024] FSC = 0x0f: level 3 permission fault [Wed Feb 21 12:06:33 2024] Data abort info: [Wed Feb 21 12:06:33 2024] ISV = 0, ISS = 0x0000000f [Wed Feb 21 12:06:33 2024] CM = 0, WnR = 0 [Wed Feb 21 12:06:33 2024] user pgtable: 4k pages, 48-bit VAs, pgdp=00000812b1f69000 [Wed Feb 21 12:06:33 2024] [00007fff9959b150] pgd=08000812b1f72003, p4d=08000812b1f72003, pud=08000812b1ff2003, pmd=08000855b2eb4003, pte=0068087760598fc3 [Wed Feb 21 12:06:33 2024] Internal error: Oops: 000000009600000f [#1] SMP [Wed Feb 21 12:06:33 2024] Modules linked in: nft_compat xt_hashlimit ip_set_hash_netport xt_length esp4 nf_conntrack_netlink zstd zstd_compress zram zsmalloc xgene_edac dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio nft_fwd_netdev nf_dup_netdev xfrm_interface xfrm6_tunnel mpls_gso mpls_iptunnel mpls_router sit nft_numgen nft_log nft_limit dummy ipip tunnel4 xfrm_user xfrm_algo nft_ct iptable_raw iptable_nat iptable_mangle ipt_REJECT nf_reject_ipv4 ip6table_security xt_CT ip6table_raw xt_nat ip6table_nat nf_nat xt_TCPMSS xt_owner xt_NFLOG xt_connbytes xt_connlabel xt_statistic xt_connmark ip6table_mangle xt_limit xt_LOG nf_log_syslog xt_mark xt_tcpudp xt_conntrack ip6t_REJECT nf_reject_ipv6 xt_multiport xt_set xt_tcpmss xt_comment ip6table_filter ip6_tables iptable_filter nfnetlink_log tcp_diag cls_bpf sch_ingress ip_gre gre geneve tun xt_bpf nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fou6 fou ip_tunnel ip6_udp_tunnel udp_tunnel ip6_tunnel tunnel6 veth nf_tables tcp_bbr sch_fq [Wed Feb 21 12:06:33 2024] ip_set_hash_ip ip_set_hash_net ip_set nfnetlink udp_diag inet_diag raid0 md_mod dm_crypt trusted asn1_encoder tee algif_skcipher af_alg 8021q garp mrp stp llc nvme_fabrics crct10dif_ce ghash_ce acpi_ipmi mlx5_core sha2_ce ipmi_ssif sha256_arm64 sha1_ce mlxfw ipmi_devintf arm_spe_pmu tiny_power_button tls igb xhci_pci nvme psample nvme_core xhci_hcd ipmi_msghandler i2c_algo_bit button i2c_designware_platform i2c_designware_core cppc_cpufreq arm_dsu_pmu tpm_tis tpm_tis_core fuse dm_mod dax efivarfs ip_tables x_tables bcmcrypt(O) aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher [last unloaded: kheaders] [Wed Feb 21 12:06:33 2024] CPU: 15 PID: 547138 Comm: nginx-ssl Tainted: G O 6.1.74-cloudflare-2024.1.14 #1 [Wed Feb 21 12:06:33 2024] Hardware name: GIGABYTE [Wed Feb 21 12:06:33 2024] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [Wed Feb 21 12:06:33 2024] pc : 0xffff8000288c0674 [Wed Feb 21 12:06:33 2024] lr : 0xffff8000288c064c [Wed Feb 21 12:06:33 2024] sp : ffff8000afdd3940 [Wed Feb 21 12:06:33 2024] x29: ffff8000afdd39d0 x28: ffff081142f99f80 x27: ffff8000afdd3940 [Wed Feb 21 12:06:33 2024] x26: 0000000000000000 x25: ffff8000afdd3990 x24: 0000000000000001 [Wed Feb 21 12:06:33 2024] x23: 000000002e4773f7 x22: ffff0800e7078300 x21: ffff08378b4c5180 [Wed Feb 21 12:06:33 2024] x20: 0000000000000000 x19: fffffbff5dc7d548 x18: 0000000000000000 [Wed Feb 21 12:06:33 2024] x17: 0000000000000000 x16: 0000000000000000 x15: ffff081b6e9e8196 [Wed Feb 21 12:06:33 2024] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [Wed Feb 21 12:06:33 2024] x11: 0000000000000000 x10: ffffda25e4cc90f0 x9 : ffffda25e4d71074 [Wed Feb 21 12:06:33 2024] x8 : ffff8000afdd3af8 x7 : 0000000000000000 x6 : 0000008124f0e5a3 [Wed Feb 21 12:06:33 2024] x5 : ffff80023c9cd000 x4 : 0000000000001000 x3 : 0000000000000008 [Wed Feb 21 12:06:33 2024] x2 : ffff081142f99f80 x1 : ffffda25e55e76a0 x0 : 00007fff9959a2d0 [Wed Feb 21 12:06:33 2024] Call trace: [Wed Feb 21 12:06:33 2024] 0xffff8000288c0674 [Wed Feb 21 12:06:33 2024] bpf_trace_run3+0xcc/0x148 [Wed Feb 21 12:06:34 2024] __bpf_trace_kfree_skb+0x14/0x20 [Wed Feb 21 12:06:34 2024] __traceiter_kfree_skb+0x50/0x78 [Wed Feb 21 12:06:34 2024] kfree_skb_reason+0xa8/0x118 [Wed Feb 21 12:06:34 2024] tcp_data_queue+0x9f8/0xe20 [Wed Feb 21 12:06:34 2024] tcp_rcv_established+0x2b4/0x738 [Wed Feb 21 12:06:34 2024] tcp_v4_do_rcv+0x194/0x2d8 [Wed Feb 21 12:06:34 2024] __release_sock+0x90/0x138 [Wed Feb 21 12:06:34 2024] release_sock+0x64/0x120 [Wed Feb 21 12:06:34 2024] tcp_recvmsg+0x80/0x1c8 [Wed Feb 21 12:06:34 2024] inet_recvmsg+0x50/0xf8 [Wed Feb 21 12:06:34 2024] sock_read_iter+0xf4/0x128 [Wed Feb 21 12:06:34 2024] vfs_read+0x27c/0x2b0 [Wed Feb 21 12:06:34 2024] ksys_read+0xe4/0x108 [Wed Feb 21 12:06:34 2024] __arm64_sys_read+0x24/0x38 [Wed Feb 21 12:06:34 2024] invoke_syscall.constprop.0+0x58/0xf8 [Wed Feb 21 12:06:34 2024] do_el0_svc+0x174/0x1a0 [Wed Feb 21 12:06:34 2024] el0_svc+0x38/0xf0 [Wed Feb 21 12:06:34 2024] el0t_64_sync_handler+0xbc/0x138 [Wed Feb 21 12:06:34 2024] el0t_64_sync+0x18c/0x190 [Wed Feb 21 12:06:34 2024] Code: b94096c0 f9001360 f9400ac0 f9427c00 (f9474014) [Wed Feb 21 12:06:34 2024] ---[ end trace 0000000000000000 ]--- Not sure if there's a similar fix for arm64 pending or is it some kind more of a cross-platform problem Ignat > > Yan > > > > > Then it must be vsyscall address that this series are fixing: > > > https://patchwork.kernel.org/project/netdevbpf/patch/20240202103935.3154011-3-houtao@xxxxxxxxxxxxxxx/ > > > > > > We're still waiting on x86 maintainers to ack them.