Re: [PATCH bpf-next v2 1/5] bpf: allow variable-offset stack access

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 1 Feb 2021 16:22:43 -0800

On Sat, Jan 30, 2021 at 05:55:36PM -0500, Andrei Matei wrote:
> Thanks for reviewing this!
> 
> On Wed, Jan 27, 2021 at 5:58 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Sun, Jan 24, 2021 at 02:49:05PM -0500, Andrei Matei wrote:
> > > + *
> > > + * If some stack slots in range are uninitialized (i.e. STACK_INVALID), the
> > > + * write is not automatically rejected. However, they are left as
> > > + * STACK_INVALID, which means that reads with the same variable offset will be
> > > + * rejected.
> > ...
> > > +             /* If the slot is STACK_INVALID, we leave it as such. We can't
> > > +              * mark the slot as initialized, as the slot might not actually
> > > +              * be written to (and so marking it as initialized opens the
> > > +              * door to leaks of uninitialized stack memory.
> > > +              */
> > > +             if (*stype != STACK_INVALID)
> > > +                     *stype = new_type;
> >
> > 'leaks of uninitialized stack memory'...
> > well that's true, but the way the user would have to deal with this
> > is to use __builtin_memset(&buf, 0, 16); for the whole buffer
> > before doing var_off write into it.
> > In the test in patch 5 would be good to add a read after this write:
> > buf[len] = buf[idx];
> > // need to do another read of buf[len] here.
> >
> > Without memset() it would fail and the user would flame us:
> > "I just wrote into this stack slot!! Why cannot the verifier figure out
> > that the read from the same location is safe?... stupid verifier..."
> >
> > I think for your use case where you read the whole thing into a stack and
> > then parse it all locations potentially touched by reads/writes would
> > be already written via helper, but imo this error is too unpleasant to
> > explain to users.
> > Especially since it happens later at a different instruction there is
> > no good verifier hint we can print.
> > It would just hit 'invalid read from stack'.
> >
> > Currently we don't allow uninit read from stack even for root.
> > I think we have to sacrifice the perfection of the verification here.
> > We can either allow reading uninit for _fixed and _var_off
> > or better yet do unconditional '*stype = new_type' here.
> > Yes it would mean that following _fixed or _var_off read could be
> > reading uninited stack. I think we have to do it for the sake
> > of user friendliness.
> > The completely buggy uninited reads from stack will still be disallowed.
> > Only the [min,max] of var_off range in stack will be considered
> > init, so imo it's a reasonable trade-off between user friendliness
> > and verifier's perfection.
> > Wdyt?
> 
> I'm happy to do whatever you tell me. But, I dunno, the verifier
> currently seems to be paranoid in ways I don't even understand (around
> speculative execution). In comparison, preventing trivial leaks of
> uninitialized memory seems relatively important. We're only talking
> about root here (as you've noted), and other various checks are less
> paranoid for root, so maybe it's no big deal. Where does the stack
> memory come from? Can it be *any* previously used kernel memory?
> A few possible alternatives (without necessarily knowing what I'm
> talking about):
> 1) Perhaps it wouldn't be a big deal to zero-initialize all the stack
> memory (up to 512 bytes) for a program. Is that out of the question?
> In many cases it'd be less than 512 bytes; the verifier knows the max
> stack needed. If the stack was always initialized, various verifier
> checks could go away.

Even if stack usage is small, like 64 byte, bzero of it has noticeable
perf penalty.

> 2) We could leave this patch as is, and work on improving the error
> you get on rejected stack reads after var-offset stack writes. I think
> the verifier could track what stack slots might have or might have not
> been written to, and when a read to such an uncertain slot is
> rejected, it could point to the previous var-off write (or one of the
> possibly many such writes) and suggest a memset. Sounds potentially
> complicated though.

too complicated imo.

> 3) Perhaps we could augment helper metadata with information about
> whether each helper promises to overwrite every byte in the buffer
> it's being passed in. This wouldn't solve the general usability
> problem we're discussing, but it would make common cases just work.
> Namely, bpf_probe_read_user() can make the required promise.
> bpf_probe_read_user_str(), however, could not.

eventually yes. That's orthogonal to this patch set.

> 
> But, again, if you think relaxing the verification is OK, I'm very
> happy to do that.

The tracing progs can read stack with bpf_probe_read_kernel anyway, so
I would prefer to relax it to improve ease of use.
Unconditional *stype = new_type; here would do the trick.
Not for unpriv, of course.
Probably another 'bool' flag (instead of jumbo allow_ptr_leaks)
like 'allow_uninit_stack' that is set with perfmon_capable().

> > >
> > >               tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
> > > -             verbose(env, "variable stack access var_off=%s off=%d size=%d\n",
> > > +             verbose(env, "var-offset stack reads only permitted to register; var_off=%s off=%d size=%d\n",
> >
> > The message is confusing. "read to register"? what is "read to not register" ?
> > Users won't be able to figure out that it means helpers access.
> > Also nowadays it means that atomic ops won't be able to use var_off into stack.
> > I think both limitations are ok for now. Only the message needs to be clear.
> 
> What message would you suggest? Would you plumb information about what
> the read type is (helper vs atomic op)?

Something like: "variable offset stack pointer cannot be passed into helper functions" ?