Re: ebpf can allow sub-word load results to be used as 64-bit pointers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 30 Nov 2024 at 18:10, <rtm@xxxxxxxxxxxxx> wrote:
>
> When I modify the bpf_cubic_init() function from
>
>   https://github.com/aroodgar/bpf-tcp-congestion-control-algorithm
>
> to read:
>
> SEC("struct_ops/bpf_cubic_init")
> void BPF_PROG(bpf_cubic_init, struct sock *sk)
> {
>   asm volatile("r2 = *(u16*)(r1 + 0)");     // verifier should demand u64
>   asm volatile("*(u32 *)(r2 +1504) = 0");   // 1280 in some configs
> }
>
> the verifier accepts it, but the second line crashes when it runs during
> a TCP connect() because the "*(u16*)" in the load from context yields
> only the low bits of the pointer.

Thanks for the bug report.

It is a missing check on "size" of the access in btf_ctx_access in
kernel/bpf/btf.c.
It is probably not seen in practice as desugaring of such arguments is
done by libbpf macros etc.
This won't just be limited to struct_ops, but many other program types
(tracing, etc.) which use this code.

I have prepared a fix here: https://github.com/kkdwivedi/linux/commits/sops

The relevant diff is:

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index e7a59e6462a9..f590cb792cf3 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6458,6 +6458,11 @@ bool btf_ctx_access(int off, int size, enum
bpf_access_type type,
                        tname, off);
                return false;
        }
+       if (size != sizeof(u64)) {
+               bpf_log(log, "func '%s' size %d is not multiple of 8\n",
+                       tname, size);
+               return false;
+       }
        arg = get_ctx_arg_idx(btf, t, off);
        args = (const struct btf_param *)(t + 1);
        /* if (t == NULL) Fall back to default BPF prog with


This is not the right 100% fix though, as fields may be u32 scalars
etc. so size needs to be checked against the member's type.
Right now this will reject some valid programs. But you get the idea.
The size of access needs to be checked.
Will test out in our CI before posting to the list.

May I ask how you happened to stumble upon this?

>
> Linux ubuntu66 6.12.0-11677-g2ba9f676d0a2 #10 SMP Sat Nov 30 11:28:09 EST 2024 x86_64 x86_64 x86_64 GNU/Linux
>
> BUG: unable to handle page fault for address: 0000000000001020
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 0 P4D 0
> Oops: Oops: 0002 [#1] SMP PTI
> CPU: 6 UID: 0 PID: 1546 Comm: a.out Not tainted 6.12.0-11677-g2ba9f676d0a2 #10
> Hardware name: FreeBSD BHYVE/BHYVE, BIOS 14.0 10/17/2021
> RIP: 0010:bpf_prog_0e20ff5294b59096_bpf_cubic_init+0x19/0x25
> Code: 00 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00
>  00 0f 1f 00 55 48 89 e5 f3 0f 1e fa 48 0f b7 77 00 <c7> 86 e0 05 00 00 00 00 00
>  00 c9 c3 cc cc cc cc cc cc cc cc cc cc
> RSP: 0018:ffffc9000076bc58 EFLAGS: 00010202
> RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000a40 RDI: ffffc9000076bc88
> RBP: ffffc9000076bc58 R08: ffffc9000076bc40 R09: ffffc9000076bc20
> R10: ffffffff842f3200 R11: ffffffff827659c0 R12: 0000000000000004
> R13: ffff888105493098 R14: 0000000000000000 R15: 00000000ffffff8d
> FS:  00007fa5348d1740(0000) GS:ffff88842fb80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000001020 CR3: 000000010bbfa003 CR4: 00000000003706f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  ? __die+0x1e/0x60
>  ? page_fault_oops+0x157/0x450
>  ? eth_header+0x25/0xb0
>  ? exc_page_fault+0x66/0x140
>  ? asm_exc_page_fault+0x26/0x30
>  ? bpf_prog_0e20ff5294b59096_bpf_cubic_init+0x19/0x25
>  ? __bpf_prog_enter+0x14/0x60
>  bpf__tcp_congestion_ops_init+0x47/0xa3
>  tcp_init_congestion_control+0x2a/0xe0
>  tcp_init_transfer+0x2b2/0x2d0
>  tcp_finish_connect+0x82/0x130
>  tcp_rcv_state_process+0x352/0xf20
>  tcp_v4_do_rcv+0xca/0x240
>  __release_sock+0xc6/0xd0
>  release_sock+0x2a/0x90
>  __inet_stream_connect+0x208/0x3c0
>  ? __pfx_woken_wake_function+0x10/0x10
>  inet_stream_connect+0x35/0x50
>  __sys_connect+0x93/0xb0
>  ? ksys_write+0x67/0xe0
>  __x64_sys_connect+0x13/0x20
>  ? ksys_write+0x67/0xe0
>  __x64_sys_connect+0x13/0x20
>  do_syscall_64+0x3f/0xd0
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Robert Morris
> rtm@xxxxxxx
>
>




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux