Re: [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook

Christian Brauner <brauner@xxxxxxxxxx> · Mon, 17 Apr 2023 16:42:30 +0200

On Fri, Apr 14, 2023 at 06:55:39PM -0700, Stanislav Fomichev wrote:
> On 04/13, Stanislav Fomichev wrote:
> > On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
> > <aleksandr.mikhalitsyn@xxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> > > >
> > > > On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
> > > > <aleksandr.mikhalitsyn@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> > > > > that bpf cgroup hook can cause FD leaks when used with sockopts which
> > > > > install FDs into the process fdtable.
> > > > >
> > > > > After some offlist discussion it was proposed to add a blacklist of
> > > >
> > > > We try to replace this word by either denylist or blocklist, even in changelogs.
> > >
> > > Hi Eric,
> > >
> > > Oh, I'm sorry about that. :( Sure.
> > >
> > > >
> > > > > socket options those can cause troubles when BPF cgroup hook is enabled.
> > > > >
> > > >
> > > > Can we find the appropriate Fixes: tag to help stable teams ?
> > >
> > > Sure, I will add next time.
> > >
> > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> > >
> > > I think it's better to add Stanislav Fomichev to CC.
> > 
> > Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
> > use it for tcp zerocopy, I'm assuming it should work in this case as
> > well?
> 
> Jakub reminded me of the other things I wanted to ask here bug forgot:
> 
> - setsockopt is probably not needed, right? setsockopt hook triggers
>   before the kernel and shouldn't leak anything
> - for getsockopt, instead of bypassing bpf completely, should we instead
>   ignore the error from the bpf program? that would still preserve

That's fine by me as well.

It'd be great if the net folks could tell Alex how they would want this
handled.

>   the observability aspect

Please see for more details
https://lore.kernel.org/lkml/20230411-nudelsalat-spreu-3038458f25c4@brauner

> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
>   gets called whenever bpf returns an error to make sure protocols have
>   a chance to handle that condition (and free the fd)

Installing an fd into an fdtable makes it visible to userspace at which
point calling close_fd() is doable but an absolute last resort and
generally a good indicator of misdesign. If the bpf hook wants to make
decisions based on the file then it should receive a struct
file, not an fd.