Re: [PATCH v6 3/13] bpf: introduce BPF token object

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 12 Oct 2023 16:45:20 -0700

On Thu, Oct 12, 2023 at 4:19 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
>
> On Wed, Oct 11, 2023 at 8:31 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> > On Tue, Oct 10, 2023 at 6:17 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> > > On Sep 27, 2023 Andrii Nakryiko <andrii@xxxxxxxxxx> wrote:
> > > >
> > > > Add new kind of BPF kernel object, BPF token. BPF token is meant to
> > > > allow delegating privileged BPF functionality, like loading a BPF
> > > > program or creating a BPF map, from privileged process to a *trusted*
> > > > unprivileged process, all while have a good amount of control over which
> > > > privileged operations could be performed using provided BPF token.
> > > >
> > > > This is achieved through mounting BPF FS instance with extra delegation
> > > > mount options, which determine what operations are delegatable, and also
> > > > constraining it to the owning user namespace (as mentioned in the
> > > > previous patch).
> > > >
> > > > BPF token itself is just a derivative from BPF FS and can be created
> > > > through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
> > > > a path specification (using the usual fd + string path combo) to a BPF
> > > > FS mount. Currently, BPF token "inherits" delegated command, map types,
> > > > prog type, and attach type bit sets from BPF FS as is. In the future,
> > > > having an BPF token as a separate object with its own FD, we can allow
> > > > to further restrict BPF token's allowable set of things either at the creation
> > > > time or after the fact, allowing the process to guard itself further
> > > > from, e.g., unintentionally trying to load undesired kind of BPF
> > > > programs. But for now we keep things simple and just copy bit sets as is.
> > > >
> > > > When BPF token is created from BPF FS mount, we take reference to the
> > > > BPF super block's owning user namespace, and then use that namespace for
> > > > checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> > > > capabilities that are normally only checked against init userns (using
> > > > capable()), but now we check them using ns_capable() instead (if BPF
> > > > token is provided). See bpf_token_capable() for details.
> > > >
> > > > Such setup means that BPF token in itself is not sufficient to grant BPF
> > > > functionality. User namespaced process has to *also* have necessary
> > > > combination of capabilities inside that user namespace. So while
> > > > previously CAP_BPF was useless when granted within user namespace, now
> > > > it gains a meaning and allows container managers and sys admins to have
> > > > a flexible control over which processes can and need to use BPF
> > > > functionality within the user namespace (i.e., container in practice).
> > > > And BPF FS delegation mount options and derived BPF tokens serve as
> > > > a per-container "flag" to grant overall ability to use bpf() (plus further
> > > > restrict on which parts of bpf() syscalls are treated as namespaced).
> > > >
> > > > The alternative to creating BPF token object was:
> > > >   a) not having any extra object and just pasing BPF FS path to each
> > > >      relevant bpf() command. This seems suboptimal as it's racy (mount
> > > >      under the same path might change in between checking it and using it
> > > >      for bpf() command). And also less flexible if we'd like to further
> > > >      restrict ourselves compared to all the delegated functionality
> > > >      allowed on BPF FS.
> > > >   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
> > > >      a dedicated FD that would represent a token-like functionality. This
> > > >      doesn't seem superior to having a proper bpf() command, so
> > > >      BPF_TOKEN_CREATE was chosen.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > > > ---
> > > >  include/linux/bpf.h            |  40 +++++++
> > > >  include/uapi/linux/bpf.h       |  39 +++++++
> > > >  kernel/bpf/Makefile            |   2 +-
> > > >  kernel/bpf/inode.c             |  10 +-
> > > >  kernel/bpf/syscall.c           |  17 +++
> > > >  kernel/bpf/token.c             | 197 +++++++++++++++++++++++++++++++++
> > > >  tools/include/uapi/linux/bpf.h |  39 +++++++
> > > >  7 files changed, 339 insertions(+), 5 deletions(-)
> > > >  create mode 100644 kernel/bpf/token.c
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index a5bd40f71fd0..c43131a24579 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -1572,6 +1576,13 @@ struct bpf_mount_opts {
> > > >       u64 delegate_attachs;
> > > >  };
> > > >
> > > > +struct bpf_token {
> > > > +     struct work_struct work;
> > > > +     atomic64_t refcnt;
> > > > +     struct user_namespace *userns;
> > > > +     u64 allowed_cmds;
> > >
> > > We'll also need a 'void *security' field to go along with the BPF token
> > > allocation/creation/free hooks, see my comments below.  This is similar
> > > to what we do for other kernel objects.
> >
> > ok, I'm thinking of adding a dedicated patch for all the
> > security-related stuff and refactoring of existing LSM hook(s).
>
> No objection here.  My main concern is that we get the LSM stuff in
> the same patchset as the rest of the BPF token patches; if all of the
> LSM bits are in a separate patch I'm not bothered.
>
> Once we settle on the LSM hooks I'll draft a SELinux implementation of
> the hooks which I'll hand off to you for inclusion in the patchset as
> well.  I'd encourage the other LSMs that are interested to do the
> same.

I posted v7 ([0]) a few minutes ago, please take a look when you get a
chance. All the LSM-related stuff is in a few separate patches.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=792765&state=*

>
> > > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > +{
> > > > +     /* BPF token allows ns_capable() level of capabilities */
> > > > +     if (token) {
> > >
> > > I think we want a LSM hook here before the token is used in the
> > > capability check.  The LSM will see the capability check, but it will
> > > not be able to distinguish it from the process which created the
> > > delegation token.  This is arguably the purpose of the delegation, but
> > > with the LSM we want to be able to control who can use the delegated
> > > privilege.  How about something like this:
> > >
> > >   if (security_bpf_token_capable(token, cap))
> > >      return false;
> >
> > sounds good, I'll add this hook
> >
> > btw, I'm thinking of guarding the BPF_TOKEN_CREATE command behind the
> > ns_capable(CAP_BPF) check, WDYT? This seems appropriate. You can get
> > BPF token only if you have CAP_BPF **within the userns**, so any
> > process not granted CAP_BPF within namespace ("container") is
> > guaranteed to not be able to do anything with BPF token.
>
> I don't recall seeing any capability checks guarding BPF_TOKEN_CREATE
> in this revision so I don't think you're weakening the restrictions
> any, and the logic above seems reasonable: if you don't have CAP_BPF
> you shouldn't be creating a token.

Right, there was none based on the assumption that you have to have an
access to BPF FS controlled through other means. But it does seem
right to use ns_capable(CAP_BPF) in addition to all that, so that's
what I did in v7.

>
> > > > +static struct bpf_token *bpf_token_alloc(void)
> > > > +{
> > > > +     struct bpf_token *token;
> > > > +
> > > > +     token = kvzalloc(sizeof(*token), GFP_USER);
> > > > +     if (!token)
> > > > +             return NULL;
> > > > +
> > > > +     atomic64_set(&token->refcnt, 1);
> > >
> > > We should have a LSM hook here to allocate the LSM state associated
> > > with the token.
> > >
> > >   if (security_bpf_token_alloc(token)) {
> > >     kvfree(token);
> > >     return NULL;
> > >   }
> > >
> > > > +     return token;
> > > > +}
> > >
> > > ...
> > >
> >
> > Would having userns and allowed_* masks filled out by that time inside
> > the token be useful (seems so if we treat bpf_token_alloc as generic
> > LSM hook). If yes, I'll add security_bpf_token_alloc() after all that
> > is filled out, right before we try to get unused fd. WDYT?
>
> The security_bpf_token_alloc() hook isn't intended to do any access
> control, it's just there so that the LSMs which need to allocate state
> for the token object can do so at the same time the token is
> allocated.  It has been my experience that allocating and releasing
> the LSM state at the same time as the primary object's state is much
> less fragile than disconnecting the two lifetimes and allocating the
> LSM state later.

You'll be able to see what I did in v7, but I don't think separate
lifetimes are a concern, because all these alloc/free hooks are called
just as bpf_{prog,map,token} are created or right when they are freed,
so in that regard everything is good.

But based on our discussion to rework bpf_prog_alloc/bpf_map_alloc
into bpf_prog_load/bpf_map_create, it would be inconsistent between
prog/map and token. Anyways, the code is out, take a look. If you hate
it, it's just a matter of adding one extra LSM hook for token, map,
and prog each.

>
> > > > +int bpf_token_create(union bpf_attr *attr)
> > > > +{
> > > > +     struct bpf_mount_opts *mnt_opts;
> > > > +     struct bpf_token *token = NULL;
> > > > +     struct inode *inode;
> > > > +     struct file *file;
> > > > +     struct path path;
> > > > +     umode_t mode;
> > > > +     int err, fd;
> > > > +
> > > > +     err = user_path_at(attr->token_create.bpffs_path_fd,
> > > > +                        u64_to_user_ptr(attr->token_create.bpffs_pathname),
> > > > +                        LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
> > > > +     if (err)
> > > > +             return err;
> > > > +
> > > > +     if (path.mnt->mnt_root != path.dentry) {
> > > > +             err = -EINVAL;
> > > > +             goto out_path;
> > > > +     }
> > > > +     err = path_permission(&path, MAY_ACCESS);
> > > > +     if (err)
> > > > +             goto out_path;
> > > > +
> > > > +     mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
> > > > +     inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
> > > > +     if (IS_ERR(inode)) {
> > > > +             err = PTR_ERR(inode);
> > > > +             goto out_path;
> > > > +     }
> > > > +
> > > > +     inode->i_op = &bpf_token_iops;
> > > > +     inode->i_fop = &bpf_token_fops;
> > > > +     clear_nlink(inode); /* make sure it is unlinked */
> > > > +
> > > > +     file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
> > > > +     if (IS_ERR(file)) {
> > > > +             iput(inode);
> > > > +             err = PTR_ERR(file);
> > > > +             goto out_file;
> > > > +     }
> > > > +
> > > > +     token = bpf_token_alloc();
> > > > +     if (!token) {
> > > > +             err = -ENOMEM;
> > > > +             goto out_file;
> > > > +     }
> > > > +
> > > > +     /* remember bpffs owning userns for future ns_capable() checks */
> > > > +     token->userns = get_user_ns(path.dentry->d_sb->s_user_ns);
> > > > +
> > > > +     mnt_opts = path.dentry->d_sb->s_fs_info;
> > > > +     token->allowed_cmds = mnt_opts->delegate_cmds;
> > >
> > > I think we would want a LSM hook here, both to control the creation
> > > of the token and mark it with the security attributes of the creating
> > > process.  How about something like this:
> > >
> > >   err = security_bpf_token_create(token);
> > >   if (err)
> > >     goto out_token;
> >
> > hmm... so you'd like both security_bpf_token_alloc() and
> > security_bpf_token_create()? They seem almost identical, do we need
> > two? Or is it that the security_bpf_token_alloc() is supposed to be
> > only used to create those `void *security` context pieces, while
> > security_bpf_token_create() is actually going to be used for
> > enforcement?
>
> I tried to explain a bit in my comment above, but the alloc hook
> basically just does the LSM state allocation whereas the token_create
> hook does the access control and setting of the LSM state associated
> with the token.
>
> If you want to get rid of the bpf_token_alloc() function and fold it
> into the bpf_token_create() function then we can go down to one LSM
> hook that covers state allocation, initialization, and access control.
> However, if it remains possible to allocate a token object outside of
> bpf_token_create() I think it is a good idea to keep the dedicated LSM
> allocation hooks.
>
> You can apply the same logic to the LSM token state destructor hook.

Yes, so that's what I went with: combining allocation and access
control. And no, there is no way to get a prog/map/token without
having a respective alloc/control hook.

I also called out the fact that we might be having a memory leak when
using multiple LSMs together and them disagreeing on alloc hook
return. You'll see in the v7 what I mean, I mention that in the commit
messages.

>
> > For my own education, is there some explicit flag or some
> > other sort of mark between LSM hooks for setting up security vs
> > enforcement? Or is it mostly based on convention and implicitly
> > following the split?
>
> Generally convention based around trying to match the LSM state
> lifetime with the lifetime of the associated object; as I mentioned
> earlier, we've had problems in the past when the two differ.  If you
> look at all of the LSM hooks you'll see a number of
> "security_XXXX_alloc()" hooks for things like superblocks, inodes,
> files, task structs, creds, and the like; we're just doing the same
> things here with BPF tokens.  If you're still not convinced, it may be
> worth noting that we currently have security_bpf_map_alloc() and
> security_bpf_prog_alloc() hooks.

It's not like I'm convinced or not, I'm trying to keep things
consistent without breaking anything or causing a mess, but also
hopefully not add too many unnecessary hooks. Let's continue on
respective patches in v7, I'm curious to hear your comments based on
the code I posted.

>
> > > > +     fd = get_unused_fd_flags(O_CLOEXEC);
> > > > +     if (fd < 0) {
> > > > +             err = fd;
> > > > +             goto out_token;
> > > > +     }
> > > > +
> > > > +     file->private_data = token;
> > > > +     fd_install(fd, file);
> > > > +
> > > > +     path_put(&path);
> > > > +     return fd;
> > > > +
> > > > +out_token:
> > > > +     bpf_token_free(token);
> > > > +out_file:
> > > > +     fput(file);
> > > > +out_path:
> > > > +     path_put(&path);
> > > > +     return err;
> > > > +}
> > >
> > > ...
> > >
> > > > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> > > > +{
> > > > +     if (!token)
> > > > +             return false;
> > > > +
> > > > +     return token->allowed_cmds & (1ULL << cmd);
> > >
> > > Similar to bpf_token_capable(), I believe we want a LSM hook here to
> > > control who is allowed to use the delegated privilege.
> > >
> > >   bool bpf_token_allow_cmd(...)
> > >   {
> > >     if (token && (token->allowed_cmds & (1ULL << cmd))
> > >       return security_bpf_token_cmd(token, cmd);
> >
> > ok, so I guess I'll have to add all four variants:
> > security_bpf_token_{cmd,map_type,prog_type,attach_type}, right?
>
> Not necessarily.  Currently only SELinux provides a set of LSM BPF
> access controls, and it only concerns itself with BPF maps and
> programs.  From a map perspective it comes down to controlling which
> applications can create a map, or use a map created elsewhere (we
> label BPF map objects just as we would any other kernel object).  From
> a program perspective, it is about loading programs and executing
> them.  While not BPF specific, SELinux also provides controls that
> restrict the ability to exercise capabilities.
>
> Keeping the above access controls in mind, I'm hopeful you can better
> understand the reasoning behind the hooks I'm proposing.  The
> security_bpf_token_cmd() hook is there so that we can control a
> token's ability to authorize BPF_MAP_CREATE and BPF_PROG_LOAD.  The
> security_bpf_token_capable() hook is there so that we can control a
> token's ability to authorize the BPF-related capabilities.  While I
> guess it's possible someone might develop a security model for BPF
> that would require the other LSM hooks you've mentioned above, but I
> see no need for that now, and to be honest I don't see any need on the
> visible horizon either.
>
> Does that make sense?

I'll reply to your second email :)

>
> --
> paul-moore.com