Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf

Andy Lutomirski <luto@xxxxxxxxxx> · Sun, 4 Aug 2019 22:47:22 -0700

On Sun, Aug 4, 2019 at 5:08 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> On Sun, Aug 4, 2019 at 3:16 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> >
> > On Fri, Aug 2, 2019 at 12:22 AM Song Liu <songliubraving@xxxxxx> wrote:
> > >
> > > Hi Andy,
> > >
> >  >> I actually agree CAP_BPF_ADMIN makes sense. The hard part is to make
> > > >> existing tools (setcap, getcap, etc.) and libraries aware of the new CAP.
> > > >
> > > > It's been done before -- it's not that hard.  IMO the main tricky bit
> > > > would be try be somewhat careful about defining exactly what
> > > > CAP_BPF_ADMIN does.
> > >
> > > Agreed. I think defining CAP_BPF_ADMIN could be a good topic for the
> > > Plumbers conference.
> > >
> > > OTOH, I don't think we have to wait for CAP_BPF_ADMIN to allow daemons
> > > like systemd to do sys_bpf() without root.
> >
> > I don't understand the use case here.  Are you talking about systemd
> > --user?  As far as I know, a user is expected to be able to fully
> > control their systemd --user process, so giving it unrestricted bpf
> > access is very close to giving it superuser access, and this doesn't
> > sound like a good idea.  I think that, if systemd --user needs bpf(),
> > it either needs real unprivileged bpf() or it needs a privileged
> > helper (SUID or a daemon) to intermediate this access.
> >
> > >
> > > >
> > > >>> I don't see why you need to invent a whole new mechanism for this.
> > > >>> The entire cgroup ecosystem outside bpf() does just fine using the
> > > >>> write permission on files in cgroupfs to control access.  Why can't
> > > >>> bpf() do the same thing?
> > > >>
> > > >> It is easier to use write permission for BPF_PROG_ATTACH. But it is
> > > >> not easy to do the same for other bpf commands: BPF_PROG_LOAD and
> > > >> BPF_MAP_*. A lot of these commands don't have target concept. Maybe
> > > >> we should have target concept for all these commands. But that is a
> > > >> much bigger project. OTOH, "all or nothing" model allows all these
> > > >> commands at once.
> > > >
> > > > For BPF_PROG_LOAD, I admit I've never understood why permission is
> > > > required at all.  I think that CAP_SYS_ADMIN or similar should be
> > > > needed to get is_priv in the verifier, but I think that should mainly
> > > > be useful for tracing, and that requires lots of privilege anyway.
> > > > BPF_MAP_* is probably the trickiest part.  One solution would be some
> > > > kind of bpffs, but I'm sure other solutions are possible.
> > >
> > > Improving permission management of cgroup_bpf is another good topic to
> > > discuss. However, it is also an overkill for current use case.
> > >
> >
> > I looked at the code some more, and I don't think this is so hard
> > after all.  As I understand it, all of the map..by_id stuff is, to
> > some extent, deprecated in favor of persistent maps.  As I see it, the
> > map..by_id calls should require privilege forever, although I can
> > imagine ways to scope that privilege to a namespace if the maps
> > themselves were to be scoped to a namespace.
> >
> > Instead, unprivileged tools would use the persistent map interface
> > roughly like this:
> >
> > $ bpftool map create /sys/fs/bpf/my_dir/filename type hash key 8 value
> > 8 entries 64 name mapname
> >
> > This would require that the caller have either CAP_DAC_OVERRIDE or
> > that the caller have permission to create files in /sys/fs/bpf/my_dir
> > (using the same rules as for any filesystem), and the resulting map
> > would end up owned by the creating user and have mode 0600 (or maybe
> > 0666, or maybe a new bpf_attr parameter) modified by umask.  Then all
> > the various capable() checks that are currently involved in accessing
> > a persistent map would instead check FMODE_READ or FMODE_WRITE on the
> > map file as appropriate.
> >
> > Half of this stuff already works.  I just set my system up like this:
> >
> > $ ls -l /sys/fs/bpf
> > total 0
> > drwxr-xr-x. 3 luto luto 0 Aug  4 15:10 luto
> >
> > $ mkdir /sys/fs/bpf/luto/test
> >
> > $ ls -l /sys/fs/bpf/luto
> > total 0
> > drwxrwxr-x. 2 luto luto 0 Aug  4 15:10 test
> >
> > I bet that making the bpf() syscalls work appropriately in this
> > context without privilege would only be a couple of hours of work.
> > The hard work, creating bpffs and making it function, is already done
> > :)
> >
> > P.S. The docs for bpftool create are less than fantastic.  The
> > complete lack of any error message at all when the syscall returns
> > -EACCES is also not fantastic.
>
> This isn't remotely finished, but I spent a bit of time fiddling with this:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=bpf/perms
>
> What do you think?  (It's obviously not done.  It doesn't compile, and
> I haven't gotten to the permissions needed to do map operations.  I
> also haven't touched the capable() checks.)

I updated the branch.  It compiles, and basic map functionality works!

# mount -t bpf bpf /sys/fs/bpf
# cd /sys/fs/bpf
# mkdir luto
# chown luto: luto
# setpriv --euid=1000 --ruid=1000 bash
$ pwd
/sys/fs/bpf
bash-5.0$ ls -l
total 0
drwxr-xr-x 2 luto luto 0 Aug  4 22:41 luto
bash-5.0$ bpftool map create /sys/fs/bpf/luto/filename type hash key 8
value 8 entries 64 name mapname
bash-5.0$ bpftool map dump pinned /sys/fs/bpf/luto/filename
Found 0 elements

# chown root: /sys/fs/bpf/luto/filename

$ bpftool map dump pinned /sys/fs/bpf/luto/filename
Error: bpf obj get (/sys/fs/bpf/luto): Permission denied

So I think it's possible to get a respectable subset of bpf()
functionality working without privilege in short order :)

(FWIW, a decent fraction of this probably works even without my
patches, but it's going to have nonsensical semantics and may fail for
silly reasons.)