Re: [PATCH v4 2/12] bpf: introduce BPF token object

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 13, 2023 at 2:46 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
>
> On Sep 12, 2023 Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > Add new kind of BPF kernel object, BPF token. BPF token is meant to
> > allow delegating privileged BPF functionality, like loading a BPF
> > program or creating a BPF map, from privileged process to a *trusted*
> > unprivileged process, all while have a good amount of control over which
> > privileged operations could be performed using provided BPF token.
> >
> > This is achieved through mounting BPF FS instance with extra delegation
> > mount options, which determine what operations are delegatable, and also
> > constraining it to the owning user namespace (as mentioned in the
> > previous patch).
> >
> > BPF token itself is just a derivative from BPF FS and can be created
> > through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
> > a path specification (using the usual fd + string path combo) to a BPF
> > FS mount. Currently, BPF token "inherits" delegated command, map types,
> > prog type, and attach type bit sets from BPF FS as is. In the future,
> > having an BPF token as a separate object with its own FD, we can allow
> > to further restrict BPF token's allowable set of things either at the creation
> > time or after the fact, allowing the process to guard itself further
> > from, e.g., unintentionally trying to load undesired kind of BPF
> > programs. But for now we keep things simple and just copy bit sets as is.
> >
> > When BPF token is created from BPF FS mount, we take reference to the
> > BPF super block's owning user namespace, and then use that namespace for
> > checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> > capabilities that are normally only checked against init userns (using
> > capable()), but now we check them using ns_capable() instead (if BPF
> > token is provided). See bpf_token_capable() for details.
> >
> > Such setup means that BPF token in itself is not sufficient to grant BPF
> > functionality. User namespaced process has to *also* have necessary
> > combination of capabilities inside that user namespace. So while
> > previously CAP_BPF was useless when granted within user namespace, now
> > it gains a meaning and allows container managers and sys admins to have
> > a flexible control over which processes can and need to use BPF
> > functionality within the user namespace (i.e., container in practice).
> > And BPF FS delegation mount options and derived BPF tokens serve as
> > a per-container "flag" to grant overall ability to use bpf() (plus further
> > restrict on which parts of bpf() syscalls are treated as namespaced).
> >
> > The alternative to creating BPF token object was:
> >   a) not having any extra object and just pasing BPF FS path to each
> >      relevant bpf() command. This seems suboptimal as it's racy (mount
> >      under the same path might change in between checking it and using it
> >      for bpf() command). And also less flexible if we'd like to further
> >      restrict ourselves compared to all the delegated functionality
> >      allowed on BPF FS.
> >   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
> >      a dedicated FD that would represent a token-like functionality. This
> >      doesn't seem superior to having a proper bpf() command, so
> >      BPF_TOKEN_CREATE was chosen.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > ---
> >  include/linux/bpf.h            |  36 +++++++
> >  include/uapi/linux/bpf.h       |  39 +++++++
> >  kernel/bpf/Makefile            |   2 +-
> >  kernel/bpf/inode.c             |   4 +-
> >  kernel/bpf/syscall.c           |  17 +++
> >  kernel/bpf/token.c             | 189 +++++++++++++++++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h |  39 +++++++
> >  7 files changed, 324 insertions(+), 2 deletions(-)
> >  create mode 100644 kernel/bpf/token.c
>
> ...
>
> > diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> > new file mode 100644
> > index 000000000000..f6ea3eddbee6
> > --- /dev/null
> > +++ b/kernel/bpf/token.c
> > @@ -0,0 +1,189 @@
> > +#include <linux/bpf.h>
> > +#include <linux/vmalloc.h>
> > +#include <linux/anon_inodes.h>
> > +#include <linux/fdtable.h>
> > +#include <linux/file.h>
> > +#include <linux/fs.h>
> > +#include <linux/kernel.h>
> > +#include <linux/idr.h>
> > +#include <linux/namei.h>
> > +#include <linux/user_namespace.h>
> > +
> > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > +{
> > +     /* BPF token allows ns_capable() level of capabilities */
> > +     if (token) {
> > +             if (ns_capable(token->userns, cap))
> > +                     return true;
> > +             if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > +                     return true;
> > +     }
> > +     /* otherwise fallback to capable() checks */
> > +     return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > +}
>
> While the above looks to be equivalent to the bpf_capable() function it
> replaces, for callers checking CAP_BPF and CAP_SYS_ADMIN, I'm looking
> quickly at patch 3/12 and this is also being used to replace a
> capable(CAP_NET_ADMIN) call which results in a change in behavior.
> The current code which performs a capable(CAP_NET_ADMIN) check cannot
> be satisfied by CAP_SYS_ADMIN, but this patchset using
> bpf_token_capable(token, CAP_NET_ADMIN) can be satisfied by either
> CAP_NET_ADMIN or CAP_SYS_ADMIN.
>
> It seems that while bpf_token_capable() can be used as a replacement
> for bpf_capable(), it is not currently a suitable replacement for a
> generic capable() call.  Perhaps this is intentional, but I didn't see
> it mentioned in the commit description, or in the comments, and I
> wanted to make sure it wasn't an oversight.

You are right. It is an intentional attempt to unify all such checks.
If you look at bpf_prog_load(), we have this:

if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN) &&
!capable(CAP_SYS_ADMIN))
    return -EPERM;

So seeing that, I realized that we did have an intent to always use
CAP_SYS_ADMIN as a "fallback" cap, even for CAP_NET_ADMIN when it
comes to using network-enabled BPF programs. So I decided that
unifying all this makes sense.

I'll add a comment mentioning this, I should have been more explicit
from the get go.

>
> > +void bpf_token_inc(struct bpf_token *token)
> > +{
> > +     atomic64_inc(&token->refcnt);
> > +}
> > +

[...]

> > +#define BPF_TOKEN_INODE_NAME "bpf-token"
> > +
> > +/* Alloc anon_inode and FD for prepared token.
> > + * Returns fd >= 0 on success; negative error, otherwise.
> > + */
> > +int bpf_token_new_fd(struct bpf_token *token)
> > +{
> > +     return anon_inode_getfd(BPF_TOKEN_INODE_NAME, &bpf_token_fops, token, O_CLOEXEC);
> > +}
> > +
> > +struct bpf_token *bpf_token_get_from_fd(u32 ufd)
> > +{
> > +     struct fd f = fdget(ufd);
> > +     struct bpf_token *token;
> > +
> > +     if (!f.file)
> > +             return ERR_PTR(-EBADF);
> > +     if (f.file->f_op != &bpf_token_fops) {
> > +             fdput(f);
> > +             return ERR_PTR(-EINVAL);
> > +     }
> > +
> > +     token = f.file->private_data;
> > +     bpf_token_inc(token);
> > +     fdput(f);
> > +
> > +     return token;
> > +}
> > +
> > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> > +{
> > +     if (!token)
> > +             return false;
> > +
> > +     return token->allowed_cmds & (1ULL << cmd);
> > +}
>
> I mentioned this a while back, likely in the other threads where this
> token-based approach was only being discussed in general terms, but I
> think we want to have a LSM hook at the point of initial token
> delegation for this and a hook when the token is used.  My initial
> thinking is that we should be able to address the former with a hook
> in bpf_fill_super() and the latter either in bpf_token_get_from_fd()
> or bpf_token_allow_XXX(); bpf_token_get_from_fd() would be simpler,
> but it doesn't allow for much in the way of granularity.  Inserting the
> LSM hooks in bpf_token_allow_XXX() would also allow the BPF code to fall
> gracefully fallback to the system-wide checks if the LSM denied the
> requested access whereas an access denial in bpf_token_get_from_fd()
> denial would cause the operation to error out.

I think the bpf_fill_super() LSM hook makes sense, but I thought
someone mentioned that we already have some generic LSM hook for
validating mounts? If we don't, I can certainly add one for BPF FS
specifically.

As for the bpf_token_allow_xxx(). This feels a bit too specific and
narrow-focused. What if we later add yet another dimension for BPF FS
and token? Do we need to introduce yet another LSM for each such case?
But also see bpf_prog_load(). There are two checks, allow_prog_type
and allow_attach_type, which are really only meaningful in
combination. And yet you'd have to have two separate LSM hooks for
that.

So I feel like the better approach is less mechanistically
concentrating on BPF token operations themselves, but rather on more
semantically meaningful operations that are token-enabled. E.g.,
protect BPF program loading, BPF map creation, BTF loading, etc. And
we do have such LSM hooks right now, though they might not be the most
convenient. So perhaps the right move is to add new ones that would
provide a bit more context (e.g., we can pass in the BPF token that
was used for the operation, attributes with which map/prog was
created, etc). Low-level token LSMs seem hard to use cohesively in
practice, though.

WDYT?

>
> --
> paul-moore.com




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux