Re: [PATCH v2 bpf-next 0/9] add new acquire/release BPF kfuncs

Christian Brauner <brauner@xxxxxxxxxx> · Thu, 7 Mar 2024 10:54:59 +0100

On Wed, Mar 06, 2024 at 08:05:05PM -0800, Alexei Starovoitov wrote:
> On Wed, Mar 6, 2024 at 4:13 AM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> >
> > On Wed, Mar 06, 2024 at 12:21:28PM +0100, Christian Brauner wrote:
> > > On Wed, Mar 06, 2024 at 07:39:14AM +0000, Matt Bobrowski wrote:
> > > > G'day All,
> > > >
> > > > The original cover letter providing background context and motivating
> > > > factors around the needs for the BPF kfuncs introduced within this
> > > > patch series can be found here [0], so please do reference that if
> > > > need be.
> > > >
> > > > Notably, one of the main contention points within v1 of this patch
> > > > series was that we were effectively leaning on some preexisting
> > > > in-kernel APIs such as get_task_exe_file() and get_mm_exe_file()
> > > > within some of the newly introduced BPF kfuncs. As noted in my
> > > > response here [1] though, I struggle to understand the technical
> > > > reasoning behind why exposing such in-kernel helpers, specifically
> > > > only to BPF LSM program types in the form of BPF kfuncs, is inherently
> > > > a terrible idea. So, until someone provides me with a sound technical
> > > > explanation as to why this cannot or should not be done, I'll continue
> > > > to lean on them. The alternative is to reimplement the necessary
> > > > in-kernel APIs within the BPF kfuncs, but that's just nonsensical IMO.
> > >
> > > You may lean as much as you like. What I've reacted to is that you've
> > > (not you specifically, I'm sure) messed up. You've exposed d_path() to
> > > users  without understanding that it wasn't safe apparently.
> > >
> > > And now we get patches that use the self-inflicted brokeness as an
> > > argument to expose a bunch of other low-level helpers to fix that.
> > >
> > > The fact that it's "just bpf LSM" programs doesn't alleviate any
> > > concerns whatsoever. Not just because that is just an entry vector but
> > > also because we have LSMs induced API abuse that we only ever get to see
> > > the fallout from when we refactor apis and then it causes pain for the vfs.
> > >
> > > I'll take another look at the proposed helpers you need as bpf kfuncs
> > > and I'll give my best not to be overly annoyed by all of this. I have no
> > > intention of not helping you quite the opposite but I'm annoyed that
> > > we're here in the first place.
> > >
> > > What I want is to stop this madness of exposing stuff to users without
> > > fully understanding it's semantics and required guarantees.
> >
> > So, looking at this series you're now asking us to expose:
> >
> > (1) mmgrab()
> > (2) mmput()
> > (3) fput()
> > (5) get_mm_exe_file()
> > (4) get_task_exe_file()
> > (7) get_task_fs_pwd()
> > (6) get_task_fs_root()
> > (8) path_get()
> > (9) path_put()
> >
> > in one go and the justification in all patches amounts to "This is
> > common in some BPF LSM programs".
> >
> > So, broken stuff got exposed to users or at least a broken BPF LSM
> > program was written somewhere out there that is susceptible to UAFs
> > becauase you didn't restrict bpf_d_path() to trusted pointer arguments.
> > So you're now scrambling to fix this by asking for a bunch of low-level
> > exports.
> >
> > What is the guarantee that you don't end up writing another BPF LSM that
> > abuses these exports in a way that causes even more issues and then
> > someone else comes back asking for the next round of bpf funcs to be
> > exposed to fix it.
> 
> There is no guarantee.
> We made a safety mistake with bpf_d_path() though
> we restricted it very tight. And that UAF is tricky.
> I'm still amazed how Jann managed to find it.
> We all make mistakes.
> It's not the first one and not going to be the last.
> 
> What Matt is doing is an honest effort to fix it
> in the upstream kernel for all bpf users to benefit.
> He could have done it with a kernel module.
> The above "low level" helpers are all either static inline
> in .h or they call EXPORT_SYMBOL[_GPL] or simply inc/dec refcnt.
> 
> One can implement such kfuncs in an out of tree kernel module
> and be done with it, but in the bpf community we encourage
> everyone to upstream their work.
> 
> So kudos to Matt for working on these patches.
> 
> His bpf-lsm use case is not special.
> It just needs a safe way to call d_path.
> 
> +SEC("lsm.s/file_open")
> +__failure __msg("R1 must be referenced or trusted")
> +int BPF_PROG(path_d_path_kfunc_untrusted_from_current)
> +{
> +       struct path *pwd;
> +       struct task_struct *current;
> +
> +       current = bpf_get_current_task_btf();
> +       /* Walking a trusted pointer returned from bpf_get_current_task_btf()
> +        * yields and untrusted pointer. */
> +       pwd = &current->fs->pwd;
> +       bpf_path_d_path(pwd, buf, sizeof(buf));
> +       return 0;
> +}
> 
> This test checks that such an access pattern is unsafe and
> the verifier will catch it.
> 
> To make it safe one needs to do:
> 
>   current = bpf_get_current_task_btf();
>   pwd = bpf_get_task_fs_pwd(current);
>   if (!pwd) // error path
>   bpf_path_d_path(pwd, ...);
>   bpf_put_path(pwd);
> 
> these are the kfuncs from patch 6.
> 
> And notice that they have KF_ACQUIRE and KF_RELEASE flags.
> 
> They tell the verifier to recognize that bpf_get_task_fs_pwd()
> kfunc acquires 'struct path *'.
> Meaning that bpf prog cannot just return without releasing it.
> 
> The bpf prog cannot use-after-free that 'pwd' either
> after it was released by bpf_put_path(pwd).
> 
> The verifier static analysis catches such UAF-s.
> It didn't catch Jann's UAF earlier, because we didn't have
> these kfuncs! Hence the fix is to add such kfuncs with
> acquire/release semantics.
> 
> > The difference between a regular LSM asking about this and a BPF LSM
> > program is that we can see in the hook implementation what the LSM
> > intends to do with this and we can judge whether that's safe or not.
> 
> See above example.
> The verifier is doing a much better job than humans when it comes
> to safety.
> 
> > Here you're asking us to do this blindfolded.
> 
> If you don't trust the verifier to enforce safety,
> you shouldn't trust Rust compiler to generate safe code either.
> 
> In another reply you've compared kfuncs to EXPORT_SYMBOL_GPL.
> Such analogy is correct to some extent,
> but unlike exported symbols kfuncs are restricted to particular
> program types. They don't accept arbitrary pointers,
> and reference count is enforced as well.
> That's a pretty big difference vs EXPORT_SYMBOL.

There's one fundamental question here that we'll need an official answer to:

Is it ok for an out-of-tree BPF LSM program, that nobody has ever seen
to request access to various helpers in the kernel?

Because fundamentally this is what this patchset is asking to be done.

If the ZFS out-of-tree kernel module were to send us a similar patch
series asking us for a list of 9 functions that they'd like us to export
what would the answer to that be? It would be "no" - on principle alone.

So what is different between an out-of-tree BPF LSM program that no one
even has ever seen and an out-of-tree kernel module that one can at
least look at in Github? Why should we reject requests from the latter
but are supposed to accept requests from the former?

If we say yes to the BPF LSM program requests we would have to say yes
to ZFS as well.