Re: Page faults in tracepoint caused by aliased pointer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 13, 2024 at 12:34 AM Kumar Kartikeya Dwivedi
<memxor@xxxxxxxxx> wrote:
>
> On Tue, 13 Feb 2024 at 01:21, Yan Zhai <yan@xxxxxxxxxxxxxx> wrote:
> >
> > On Mon, Feb 12, 2024 at 5:52 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Mon, Feb 12, 2024 at 3:42 PM Kumar Kartikeya Dwivedi
> > > <memxor@xxxxxxxxx> wrote:
> > > >
> > > > On Tue, 13 Feb 2024 at 00:34, Alexei Starovoitov
> > > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, Feb 12, 2024 at 3:16 PM Ignat Korchagin <ignat@xxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > [288931.217143][T109754] CPU: 4 PID: 109754 Comm: bpftrace Not tainted
> > > > > > 6.6.16+ #10
> > > > >
> > > > > ...
> > > > > > [288931.217143][T109754]  ? copy_from_kernel_nofault+0x1d/0xe0
> > > > > > [288931.217143][T109754]  bpf_probe_read_compat+0x6a/0x90
> > > > > >
> > > > > > And Jakub CCed here did it for 6.8.0-rc2+
> > > > >
> > > > > I suspect something is broken in your kernels.
> > > > > Above is doing generic copy_from_kernel_nofault(),
> > > > > so one should be able to crash the kernel without any bpf.
> > > > >
> > > > > We have this in selftests/bpf:
> > > > > __weak noinline struct file *bpf_testmod_return_ptr(int arg)
> > > > > {
> > > > >         static struct file f = {};
> > > > >
> > > > >         switch (arg) {
> > > > >         case 1: return (void *)EINVAL;          /* user addr */
> > > > >         case 2: return (void *)0xcafe4a11;      /* user addr */
> > > > >         case 3: return (void *)-EINVAL;         /* canonical, but invalid */
> > > > >         case 4: return (void *)(1ull << 60);    /* non-canonical and invalid */
> > > > >         case 5: return (void *)~(1ull << 30);   /* trigger extable */
> > > > >         case 6: return &f;                      /* valid addr */
> > > > >         case 7: return (void *)((long)&f | 1);  /* kernel tricks */
> > > > >         default: return NULL;
> > > > >         }
> > > > > }
> > > > > where we check that extables setup by JIT for bpf progs are working correctly.
> > > > > You should see the kernel crashing when you just run bpf selftests.
> > > >
> > > > I agree, this appears unrelated to BPF since it is happening when
> > > > using copy_from_kernel_nofault (which should be jumping to the Efault
> > > > label instead of the oops), but I think it's not specific to some
> > > > custom kernel. I can reproduce it on my dev machine on top of bpf-next
> > > > as well, and another machine with Ubuntu's generic 6.5 kernel for
> > > > 24.04. And I think Ignat tried it on the mainline 6.8-rc2 as well.
> > >
> > copy_from_kernel_nofault is called in Jakub's reproducer, but the
> > panic case in our production seems to be direct memory accessing
> > according to bpftool dumped jited code. Will faults from such
> > instructions also be caught correctly?
> >
>
> Yep, since faults in both cases end up in the page fault handler.
> Once the fix pointed out by Alexei is applied, it should address both scenarios.

Just as a follow up the patches do seem to help for x86, but we've
recently encountered a similar problem on arm64 (6.1.74 kernel):

[Wed Feb 21 12:06:33 2024] Unable to handle kernel access to user
memory outside uaccess routines at virtual address 00007fff9959b150
[Wed Feb 21 12:06:33 2024] Mem abort info:
[Wed Feb 21 12:06:33 2024]   ESR = 0x000000009600000f
[Wed Feb 21 12:06:33 2024]   EC = 0x25: DABT (current EL), IL = 32 bits
[Wed Feb 21 12:06:33 2024]   SET = 0, FnV = 0
[Wed Feb 21 12:06:33 2024]   EA = 0, S1PTW = 0
[Wed Feb 21 12:06:33 2024]   FSC = 0x0f: level 3 permission fault
[Wed Feb 21 12:06:33 2024] Data abort info:
[Wed Feb 21 12:06:33 2024]   ISV = 0, ISS = 0x0000000f
[Wed Feb 21 12:06:33 2024]   CM = 0, WnR = 0
[Wed Feb 21 12:06:33 2024] user pgtable: 4k pages, 48-bit VAs,
pgdp=00000812b1f69000
[Wed Feb 21 12:06:33 2024] [00007fff9959b150] pgd=08000812b1f72003,
p4d=08000812b1f72003, pud=08000812b1ff2003, pmd=08000855b2eb4003,
pte=0068087760598fc3
[Wed Feb 21 12:06:33 2024] Internal error: Oops: 000000009600000f [#1] SMP
[Wed Feb 21 12:06:33 2024] Modules linked in: nft_compat xt_hashlimit
ip_set_hash_netport xt_length esp4 nf_conntrack_netlink zstd
zstd_compress zram zsmalloc xgene_edac dm_thin_pool dm_persistent_data
dm_bio_prison dm_bufio nft_fwd_netdev nf_dup_netdev xfrm_interface
xfrm6_tunnel mpls_gso mpls_iptunnel mpls_router sit nft_numgen nft_log
nft_limit dummy ipip tunnel4 xfrm_user xfrm_algo nft_ct iptable_raw
iptable_nat iptable_mangle ipt_REJECT nf_reject_ipv4 ip6table_security
xt_CT ip6table_raw xt_nat ip6table_nat nf_nat xt_TCPMSS xt_owner
xt_NFLOG xt_connbytes xt_connlabel xt_statistic xt_connmark
ip6table_mangle xt_limit xt_LOG nf_log_syslog xt_mark xt_tcpudp
xt_conntrack ip6t_REJECT nf_reject_ipv6 xt_multiport xt_set xt_tcpmss
xt_comment ip6table_filter ip6_tables iptable_filter nfnetlink_log
tcp_diag cls_bpf sch_ingress ip_gre gre geneve tun xt_bpf nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 fou6 fou ip_tunnel ip6_udp_tunnel
udp_tunnel ip6_tunnel tunnel6 veth nf_tables tcp_bbr sch_fq
[Wed Feb 21 12:06:33 2024]  ip_set_hash_ip ip_set_hash_net ip_set
nfnetlink udp_diag inet_diag raid0 md_mod dm_crypt trusted
asn1_encoder tee algif_skcipher af_alg 8021q garp mrp stp llc
nvme_fabrics crct10dif_ce ghash_ce acpi_ipmi mlx5_core sha2_ce
ipmi_ssif sha256_arm64 sha1_ce mlxfw ipmi_devintf arm_spe_pmu
tiny_power_button tls igb xhci_pci nvme psample nvme_core xhci_hcd
ipmi_msghandler i2c_algo_bit button i2c_designware_platform
i2c_designware_core cppc_cpufreq arm_dsu_pmu tpm_tis tpm_tis_core fuse
dm_mod dax efivarfs ip_tables x_tables bcmcrypt(O) aes_neon_bs
aes_neon_blk aes_ce_blk aes_ce_cipher [last unloaded: kheaders]
[Wed Feb 21 12:06:33 2024] CPU: 15 PID: 547138 Comm: nginx-ssl
Tainted: G           O       6.1.74-cloudflare-2024.1.14 #1
[Wed Feb 21 12:06:33 2024] Hardware name: GIGABYTE
[Wed Feb 21 12:06:33 2024] pstate: 20400009 (nzCv daif +PAN -UAO -TCO
-DIT -SSBS BTYPE=--)
[Wed Feb 21 12:06:33 2024] pc : 0xffff8000288c0674
[Wed Feb 21 12:06:33 2024] lr : 0xffff8000288c064c
[Wed Feb 21 12:06:33 2024] sp : ffff8000afdd3940
[Wed Feb 21 12:06:33 2024] x29: ffff8000afdd39d0 x28: ffff081142f99f80
x27: ffff8000afdd3940
[Wed Feb 21 12:06:33 2024] x26: 0000000000000000 x25: ffff8000afdd3990
x24: 0000000000000001
[Wed Feb 21 12:06:33 2024] x23: 000000002e4773f7 x22: ffff0800e7078300
x21: ffff08378b4c5180
[Wed Feb 21 12:06:33 2024] x20: 0000000000000000 x19: fffffbff5dc7d548
x18: 0000000000000000
[Wed Feb 21 12:06:33 2024] x17: 0000000000000000 x16: 0000000000000000
x15: ffff081b6e9e8196
[Wed Feb 21 12:06:33 2024] x14: 0000000000000000 x13: 0000000000000000
x12: 0000000000000000
[Wed Feb 21 12:06:33 2024] x11: 0000000000000000 x10: ffffda25e4cc90f0
x9 : ffffda25e4d71074
[Wed Feb 21 12:06:33 2024] x8 : ffff8000afdd3af8 x7 : 0000000000000000
x6 : 0000008124f0e5a3
[Wed Feb 21 12:06:33 2024] x5 : ffff80023c9cd000 x4 : 0000000000001000
x3 : 0000000000000008
[Wed Feb 21 12:06:33 2024] x2 : ffff081142f99f80 x1 : ffffda25e55e76a0
x0 : 00007fff9959a2d0
[Wed Feb 21 12:06:33 2024] Call trace:
[Wed Feb 21 12:06:33 2024]  0xffff8000288c0674
[Wed Feb 21 12:06:33 2024]  bpf_trace_run3+0xcc/0x148
[Wed Feb 21 12:06:34 2024]  __bpf_trace_kfree_skb+0x14/0x20
[Wed Feb 21 12:06:34 2024]  __traceiter_kfree_skb+0x50/0x78
[Wed Feb 21 12:06:34 2024]  kfree_skb_reason+0xa8/0x118
[Wed Feb 21 12:06:34 2024]  tcp_data_queue+0x9f8/0xe20
[Wed Feb 21 12:06:34 2024]  tcp_rcv_established+0x2b4/0x738
[Wed Feb 21 12:06:34 2024]  tcp_v4_do_rcv+0x194/0x2d8
[Wed Feb 21 12:06:34 2024]  __release_sock+0x90/0x138
[Wed Feb 21 12:06:34 2024]  release_sock+0x64/0x120
[Wed Feb 21 12:06:34 2024]  tcp_recvmsg+0x80/0x1c8
[Wed Feb 21 12:06:34 2024]  inet_recvmsg+0x50/0xf8
[Wed Feb 21 12:06:34 2024]  sock_read_iter+0xf4/0x128
[Wed Feb 21 12:06:34 2024]  vfs_read+0x27c/0x2b0
[Wed Feb 21 12:06:34 2024]  ksys_read+0xe4/0x108
[Wed Feb 21 12:06:34 2024]  __arm64_sys_read+0x24/0x38
[Wed Feb 21 12:06:34 2024]  invoke_syscall.constprop.0+0x58/0xf8
[Wed Feb 21 12:06:34 2024]  do_el0_svc+0x174/0x1a0
[Wed Feb 21 12:06:34 2024]  el0_svc+0x38/0xf0
[Wed Feb 21 12:06:34 2024]  el0t_64_sync_handler+0xbc/0x138
[Wed Feb 21 12:06:34 2024]  el0t_64_sync+0x18c/0x190
[Wed Feb 21 12:06:34 2024] Code: b94096c0 f9001360 f9400ac0 f9427c00 (f9474014)
[Wed Feb 21 12:06:34 2024] ---[ end trace 0000000000000000 ]---

Not sure if there's a similar fix for arm64 pending or is it some kind
more of a cross-platform problem

Ignat

> > Yan
> >
> > > Then it must be vsyscall address that this series are fixing:
> > > https://patchwork.kernel.org/project/netdevbpf/patch/20240202103935.3154011-3-houtao@xxxxxxxxxxxxxxx/
> > >
> > > We're still waiting on x86 maintainers to ack them.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux