Re: [PATCH stable 5.10] bpf: Allow reads from uninit stack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 11, 2024 at 09:43:21PM +0300, Maxim Mikityanskiy wrote:
> From: Eduard Zingerman <eddyz87@xxxxxxxxx>
> 
> [ Upstream commit 6715df8d5d24655b9fd368e904028112b54c7de1 ]
> 
> This commits updates the following functions to allow reads from
> uninitialized stack locations when env->allow_uninit_stack option is
> enabled:
> - check_stack_read_fixed_off()
> - check_stack_range_initialized(), called from:
>   - check_stack_read_var_off()
>   - check_helper_mem_access()
> 
> Such change allows to relax logic in stacksafe() to treat STACK_MISC
> and STACK_INVALID in a same way and make the following stack slot
> configurations equivalent:
> 
>   |  Cached state    |  Current state   |
>   |   stack slot     |   stack slot     |
>   |------------------+------------------|
>   | STACK_INVALID or | STACK_INVALID or |
>   | STACK_MISC       | STACK_SPILL   or |
>   |                  | STACK_MISC    or |
>   |                  | STACK_ZERO    or |
>   |                  | STACK_DYNPTR     |
> 
> This leads to significant verification speed gains (see below).
> 
> The idea was suggested by Andrii Nakryiko [1] and initial patch was
> created by Alexei Starovoitov [2].
> 
> Currently the env->allow_uninit_stack is allowed for programs loaded
> by users with CAP_PERFMON or CAP_SYS_ADMIN capabilities.
> 
> A number of test cases from verifier/*.c were expecting uninitialized
> stack access to be an error. These test cases were updated to execute
> in unprivileged mode (thus preserving the tests).
> 
> The test progs/test_global_func10.c expected "invalid indirect read
> from stack" error message because of the access to uninitialized
> memory region. This error is no longer possible in privileged mode.
> The test is updated to provoke an error "invalid indirect access to
> stack" because of access to invalid stack address (such error is not
> verified by progs/test_global_func*.c series of tests).
> 
> The following tests had to be removed because these can't be made
> unprivileged:
> - verifier/sock.c:
>   - "sk_storage_get(map, skb->sk, &stack_value, 1): partially init
>   stack_value"
>   BPF_PROG_TYPE_SCHED_CLS programs are not executed in unprivileged mode.
> - verifier/var_off.c:
>   - "indirect variable-offset stack access, max_off+size > max_initialized"
>   - "indirect variable-offset stack access, uninitialized"
>   These tests verify that access to uninitialized stack values is
>   detected when stack offset is not a constant. However, variable
>   stack access is prohibited in unprivileged mode, thus these tests
>   are no longer valid.
> 
>  * * *
> 
> Here is veristat log comparing this patch with current master on a
> set of selftest binaries listed in tools/testing/selftests/bpf/veristat.cfg
> and cilium BPF binaries (see [3]):
> 
> $ ./veristat -e file,prog,states -C -f 'states_pct<-30' master.log current.log
> File                        Program                     States (A)  States (B)  States    (DIFF)
> --------------------------  --------------------------  ----------  ----------  ----------------
> bpf_host.o                  tail_handle_ipv6_from_host         349         244    -105 (-30.09%)
> bpf_host.o                  tail_handle_nat_fwd_ipv4          1320         895    -425 (-32.20%)
> bpf_lxc.o                   tail_handle_nat_fwd_ipv4          1320         895    -425 (-32.20%)
> bpf_sock.o                  cil_sock4_connect                   70          48     -22 (-31.43%)
> bpf_sock.o                  cil_sock4_sendmsg                   68          46     -22 (-32.35%)
> bpf_xdp.o                   tail_handle_nat_fwd_ipv4          1554         803    -751 (-48.33%)
> bpf_xdp.o                   tail_lb_ipv4                      6457        2473   -3984 (-61.70%)
> bpf_xdp.o                   tail_lb_ipv6                      7249        3908   -3341 (-46.09%)
> pyperf600_bpf_loop.bpf.o    on_event                           287         145    -142 (-49.48%)
> strobemeta.bpf.o            on_event                         15915        4772  -11143 (-70.02%)
> strobemeta_nounroll2.bpf.o  on_event                         17087        3820  -13267 (-77.64%)
> xdp_synproxy_kern.bpf.o     syncookie_tc                     21271        6635  -14636 (-68.81%)
> xdp_synproxy_kern.bpf.o     syncookie_xdp                    23122        6024  -17098 (-73.95%)
> --------------------------  --------------------------  ----------  ----------  ----------------
> 
> Note: I limited selection by states_pct<-30%.
> 
> Inspection of differences in pyperf600_bpf_loop behavior shows that
> the following patch for the test removes almost all differences:
> 
>     - a/tools/testing/selftests/bpf/progs/pyperf.h
>     + b/tools/testing/selftests/bpf/progs/pyperf.h
>     @ -266,8 +266,8 @ int __on_event(struct bpf_raw_tracepoint_args *ctx)
>             }
> 
>             if (event->pthread_match || !pidData->use_tls) {
>     -               void* frame_ptr;
>     -               FrameData frame;
>     +               void* frame_ptr = 0;
>     +               FrameData frame = {};
>                     Symbol sym = {};
>                     int cur_cpu = bpf_get_smp_processor_id();
> 
> W/o this patch the difference comes from the following pattern
> (for different variables):
> 
>     static bool get_frame_data(... FrameData *frame ...)
>     {
>         ...
>         bpf_probe_read_user(&frame->f_code, ...);
>         if (!frame->f_code)
>             return false;
>         ...
>         bpf_probe_read_user(&frame->co_name, ...);
>         if (frame->co_name)
>             ...;
>     }
> 
>     int __on_event(struct bpf_raw_tracepoint_args *ctx)
>     {
>         FrameData frame;
>         ...
>         get_frame_data(... &frame ...) // indirectly via a bpf_loop & callback
>         ...
>     }
> 
>     SEC("raw_tracepoint/kfree_skb")
>     int on_event(struct bpf_raw_tracepoint_args* ctx)
>     {
>         ...
>         ret |= __on_event(ctx);
>         ret |= __on_event(ctx);
>         ...
>     }
> 
> With regards to value `frame->co_name` the following is important:
> - Because of the conditional `if (!frame->f_code)` each call to
>   __on_event() produces two states, one with `frame->co_name` marked
>   as STACK_MISC, another with it as is (and marked STACK_INVALID on a
>   first call).
> - The call to bpf_probe_read_user() does not mark stack slots
>   corresponding to `&frame->co_name` as REG_LIVE_WRITTEN but it marks
>   these slots as BPF_MISC, this happens because of the following loop
>   in the check_helper_call():
> 
> 	for (i = 0; i < meta.access_size; i++) {
> 		err = check_mem_access(env, insn_idx, meta.regno, i, BPF_B,
> 				       BPF_WRITE, -1, false);
> 		if (err)
> 			return err;
> 	}
> 
>   Note the size of the write, it is a one byte write for each byte
>   touched by a helper. The BPF_B write does not lead to write marks
>   for the target stack slot.
> - Which means that w/o this patch when second __on_event() call is
>   verified `if (frame->co_name)` will propagate read marks first to a
>   stack slot with STACK_MISC marks and second to a stack slot with
>   STACK_INVALID marks and these states would be considered different.
> 
> [1] https://lore.kernel.org/bpf/CAEf4BzY3e+ZuC6HUa8dCiUovQRg2SzEk7M-dSkqNZyn=xEmnPA@xxxxxxxxxxxxxx/
> [2] https://lore.kernel.org/bpf/CAADnVQKs2i1iuZ5SUGuJtxWVfGYR9kDgYKhq3rNV+kBLQCu7rA@xxxxxxxxxxxxxx/
> [3] git@xxxxxxxxxx:anakryiko/cilium.git
> 
> Suggested-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> Co-developed-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
> Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> Link: https://lore.kernel.org/r/20230219200427.606541-2-eddyz87@xxxxxxxxx
> Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> Signed-off-by: Maxim Mikityanskiy <maxim@xxxxxxxxxxxxx>
> ---
> Backporting to address the complexity regression introduced by commit
> 71f656a50176 ("bpf: Fix to preserve reg parent/live fields when copying
> range info"), that affects Cilium built with LLVM 18.

All now queued up, thanks.

greg k-h




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux