Re: [PATCH bpf-next v3 1/2] bpf: Simplify checking size of helper accesses

Andrei Matei <andreimatei1@xxxxxxxxx> · Thu, 21 Dec 2023 12:51:14 -0500

On Wed, Dec 20, 2023 at 11:30 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Wed, Dec 20, 2023 at 9:06 AM Andrei Matei <andreimatei1@xxxxxxxxx> wrote:
> >
> > This patch simplifies the verification of size arguments associated to
> > pointer arguments to helpers and kfuncs. Many helpers take a pointer
> > argument followed by the size of the memory access performed to be
> > performed through that pointer. Before this patch, the handling of the
> > size argument in check_mem_size_reg() was confusing and wasteful: if the
> > size register's lower bound was 0, then the verification was done twice:
> > once considering the size of the access to be the lower-bound of the
> > respective argument, and once considering the upper bound (even if the
> > two are the same). The upper bound checking is a super-set of the
> > lower-bound checking(*), except: the only point of the lower-bound check
> > is to handle the case where zero-sized-accesses are explicitly not
> > allowed and the lower-bound is zero. This static condition is now
> > checked explicitly, replacing a much more complex, expensive and
> > confusing verification call to check_helper_mem_access().
> >
> > Now that check_mem_size_reg() deals directly with the zero_size_allowed
> > checking, the single remaining call to check_helper_mem_access() can
> > pass a static value for the zero_size_allowed arg, instead of
> > propagating a dynamic one. I think this is an improvement, as tracking
> > the wide propagation of zero_sized_allowed is already complicated.
> >
> > Error messages change in this patch. Before, messages about illegal
> > zero-size accesses depended on the type of the pointer and on other
> > conditions, and sometimes the message was plain wrong: in some tests
> > that changed you'll see that the old message was something like "R1 min
> > value is outside of the allowed memory range", where R1 is the pointer
> > register; the error was wrongly claiming that the pointer was bad
> > instead of the size being bad. Other times the information that the size
> > came for a register with a possible range of values was wrong, and the
> > error presented the size as a fixed zero. Now the errors refer to the
> > right register. However, the old error messages did contain useful
> > information about the pointer register which is now lost. The next patch
> > will bring that information back.
> >
> > (*) Besides standing to reason that the checks for a bigger size access
> > are a super-set of the checks for a smaller size access, I have also
> > mechanically verified this by reading the code for all types of
> > pointers. I could convince myself that it's true for all but
> > PTR_TO_BTF_ID (check_ptr_to_btf_access). There, simply looking
> > line-by-line does not immediately prove what we want. If anyone has any
> > qualms, let me know.
> >
> > Signed-off-by: Andrei Matei <andreimatei1@xxxxxxxxx>
> > ---
> >  kernel/bpf/verifier.c                         | 28 ++++++++----
> >  .../bpf/progs/verifier_helper_value_access.c  | 45 +++++++++++++++++--
> >  .../selftests/bpf/progs/verifier_raw_stack.c  |  2 +-
> >  3 files changed, 61 insertions(+), 14 deletions(-)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 1863826a4ac3..4409b8f2b0f3 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -7267,6 +7267,7 @@ static int check_mem_size_reg(struct bpf_verifier_env *env,
> >                               bool zero_size_allowed,
> >                               struct bpf_call_arg_meta *meta)
> >  {
> > +       const bool size_is_const = tnum_is_const(reg->var_off);
> >         int err;
> >
> >         /* This is used to refine r0 return value bounds for helpers
> > @@ -7282,7 +7283,7 @@ static int check_mem_size_reg(struct bpf_verifier_env *env,
> >         /* The register is SCALAR_VALUE; the access check
> >          * happens using its boundaries.
> >          */
> > -       if (!tnum_is_const(reg->var_off))
> > +       if (!size_is_const)
> >                 /* For unprivileged variable accesses, disable raw
> >                  * mode so that the program is required to
> >                  * initialize all the memory that the helper could
> > @@ -7296,12 +7297,9 @@ static int check_mem_size_reg(struct bpf_verifier_env *env,
> >                 return -EACCES;
> >         }
> >
> > -       if (reg->umin_value == 0) {
> > -               err = check_helper_mem_access(env, regno - 1, 0,
> > -                                             zero_size_allowed,
> > -                                             meta);
> > -               if (err)
> > -                       return err;
> > +       if (reg->umin_value == 0 && !zero_size_allowed) {
> > +               verbose(env, "R%d invalid zero-sized read\n", regno);
> > +               return -EACCES;
> >         }
> >
>
> I feel like this simplification is the only one necessary. Code change
> below (for umax) seems unnecessary.
>
> >         if (reg->umax_value >= BPF_MAX_VAR_SIZ) {
> > @@ -7309,9 +7307,21 @@ static int check_mem_size_reg(struct bpf_verifier_env *env,
> >                         regno);
> >                 return -EACCES;
> >         }
> > +       /* If !zero_size_allowed, we already checked that umin_value > 0, so
> > +        * umax_value should also be > 0.
> > +        */
> > +       if (reg->umax_value == 0 && !zero_size_allowed) {
> > +               verbose(env, "verifier bug: !zero_size_allowed should have been handled already\n");
> > +               return -EFAULT;
> > +       }
>
> This check seems unnecessary. If we have a bug and umax < umin, then
> a) we should detect it earlier in reg bounds sanity check and b)
> check_helper_mem_access would still reject umax==0 case if
> !zero_size_allowed. On the other hand, this check does nothing if
> zero_size_allowed==true.
>
> So it's at best partially useful, I'd just drop it. If you do drop it,
> please add my ack to the next revision, thanks. (I might disappear due
> to holidays, so might be slow to review/reply going forward).
>
> Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
>
> >         err = check_helper_mem_access(env, regno - 1,
> > -                                     reg->umax_value,
> > -                                     zero_size_allowed, meta);
> > +                               reg->umax_value,
> > +                               /* zero_size_allowed: we asserted above that umax_value is not
> > +                                * zero if !zero_size_allowed, so we don't need any further
> > +                                * checks.
> > +                                */
> > +                               true,
> > +                               meta);
>
> and here if we leave zero_size_allowed, what's the worst that can
> happen? I'd keep the original call as is.

Nothing bad will happen. I can revert these changes if you want, no problem.
But:
The point of this code change was not to have any effects at run-time, but
rather to simplify the code conceptually. The way I see it, terminating the
dynamic aspect of zero_size_allowed here is a good thing: with this change, all
callers now pass a static constant as zero_size_allowed to
check_helper_mem_access(), so tracking the possible values of the argument
becomes much easier. I generally dislike the fact that a lot of functions have
this zero_size_allowed argument; I've tried to figure out some alternative
where zero-sized reads are summarily rejected somewhere high-up so that
functions like check_packet_access, check_map_access, check_mem_region_access,
check_buffer_access, check_stack_range_initialized do not need this argument
any more. But so far I came up empty handed and gave up for now, given that
these functions are called from multiple places. Still, I see
check_mem_size_reg() passing a static `true` as a step in the right direction
for future refactorings.
Similarly, the point of the assertion I've added above was not that it's
"necessary"; the point was for it to act like commentary assuring the reader
that the value of zero_size_allowed doesn't matter any more.

Since we're talking, let me ask you this: would you agree that, if the access
size is zero, the pointer value does not need to be checked *at all*? Meaning,
if zero_size_allowed is true and the size is zero, the verifier can allow even
invalid pointers (or registers that are not a pointer at all) to be used?
Because if the answer is yes, that might help getting a cleaner code structure
in place -- because it would mean that verifying zero-sized accesses can be
terminated early both for zero_size_allowed = true/false.

>
> >         if (!err)
> >                 err = mark_chain_precision(env, regno);
> >         return err;
> > diff --git a/tools/testing/selftests/bpf/progs/verifier_helper_value_access.c b/tools/testing/selftests/bpf/progs/verifier_helper_value_access.c
> > index 692216c0ad3d..137cce939711 100644
> > --- a/tools/testing/selftests/bpf/progs/verifier_helper_value_access.c
> > +++ b/tools/testing/selftests/bpf/progs/verifier_helper_value_access.c
> > @@ -89,9 +89,14 @@ l0_%=:       exit;                                           \
> >         : __clobber_all);
> >  }
> >
> > +/* Call a function taking a pointer and a size which doesn't allow the size to
> > + * be zero (i.e. bpf_trace_printk() declares the second argument to be
> > + * ARG_CONST_SIZE, not ARG_CONST_SIZE_OR_ZERO). We attempt to pass zero for the
> > + * size and expect to fail.
> > + */
> >  SEC("tracepoint")
> >  __description("helper access to map: empty range")
> > -__failure __msg("invalid access to map value, value_size=48 off=0 size=0")
> > +__failure __msg("R2 invalid zero-sized read")
> >  __naked void access_to_map_empty_range(void)
> >  {
> >         asm volatile ("                                 \
> > @@ -113,6 +118,38 @@ l0_%=:     exit;                                           \
> >         : __clobber_all);
> >  }
> >
> > +/* Like the test above, but this time the size register is not known to be zero;
> > + * its lower-bound is zero though, which is still unacceptible.
>
> typo: unacceptable
>
> we normally add new tests in a separate patch. Fixing existing tests
> to make them pass together with kernel change is the only case were we
> mix selftests changes and kernel changes.
>
> > + */
> > +SEC("tracepoint")
> > +__description("helper access to map: possibly-empty range")
> > +__failure __msg("R2 invalid zero-sized read")
> > +__naked void access_to_map_possibly_empty_range(void)
> > +{
> > +       asm volatile ("                                         \
> > +       r2 = r10;                                               \
> > +       r2 += -8;                                               \
> > +       r1 = 0;                                                 \
> > +       *(u64*)(r2 + 0) = r1;                                   \
> > +       r1 = %[map_hash_48b] ll;                                \
> > +       call %[bpf_map_lookup_elem];                            \
> > +       if r0 == 0 goto l0_%=;                                  \
> > +       r1 = r0;                                                \
> > +       /* Read an unknown value */                             \
> > +       r7 = *(u64*)(r0 + 0);                                   \
> > +       /* Make it small and positive, to avoid other errors */ \
> > +       r7 &= 4;                                                \
> > +       r2 = 0;                                                 \
> > +       r2 += r7;                                               \
> > +       call %[bpf_trace_printk];                               \
> > +l0_%=: exit;                                               \
> > +"      :
> > +       : __imm(bpf_map_lookup_elem),
> > +         __imm(bpf_trace_printk),
> > +         __imm_addr(map_hash_48b)
> > +       : __clobber_all);
> > +}
> > +
> >  SEC("tracepoint")
> >  __description("helper access to map: out-of-bound range")
> >  __failure __msg("invalid access to map value, value_size=48 off=0 size=56")
> > @@ -221,7 +258,7 @@ l0_%=:      exit;                                           \
> >
> >  SEC("tracepoint")
> >  __description("helper access to adjusted map (via const imm): empty range")
> > -__failure __msg("invalid access to map value, value_size=48 off=4 size=0")
> > +__failure __msg("R2 invalid zero-sized read")
> >  __naked void via_const_imm_empty_range(void)
> >  {
> >         asm volatile ("                                 \
> > @@ -386,7 +423,7 @@ l0_%=:      exit;                                           \
> >
> >  SEC("tracepoint")
> >  __description("helper access to adjusted map (via const reg): empty range")
> > -__failure __msg("R1 min value is outside of the allowed memory range")
> > +__failure __msg("R2 invalid zero-sized read")
> >  __naked void via_const_reg_empty_range(void)
> >  {
> >         asm volatile ("                                 \
> > @@ -556,7 +593,7 @@ l0_%=:      exit;                                           \
> >
> >  SEC("tracepoint")
> >  __description("helper access to adjusted map (via variable): empty range")
> > -__failure __msg("R1 min value is outside of the allowed memory range")
> > +__failure __msg("R2 invalid zero-sized read")
> >  __naked void map_via_variable_empty_range(void)
> >  {
> >         asm volatile ("                                 \
> > diff --git a/tools/testing/selftests/bpf/progs/verifier_raw_stack.c b/tools/testing/selftests/bpf/progs/verifier_raw_stack.c
> > index f67390224a9c..3dbda85e2997 100644
> > --- a/tools/testing/selftests/bpf/progs/verifier_raw_stack.c
> > +++ b/tools/testing/selftests/bpf/progs/verifier_raw_stack.c
> > @@ -64,7 +64,7 @@ __naked void load_bytes_negative_len_2(void)
> >
> >  SEC("tc")
> >  __description("raw_stack: skb_load_bytes, zero len")
> > -__failure __msg("invalid zero-sized read")
> > +__failure __msg("R4 invalid zero-sized read")
> >  __naked void skb_load_bytes_zero_len(void)
> >  {
> >         asm volatile ("                                 \
> > --
> > 2.40.1
> >