On 04/17, Martin KaFai Lau wrote: > On 4/14/23 6:55 PM, Stanislav Fomichev wrote: > > On 04/13, Stanislav Fomichev wrote: > > > On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn > > > <aleksandr.mikhalitsyn@xxxxxxxxxxxxx> wrote: > > > > > > > > On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > > > > > > > > > On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn > > > > > <aleksandr.mikhalitsyn@xxxxxxxxxxxxx> wrote: > > > > > > > > > > > > During work on SO_PEERPIDFD, it was discovered (thanks to Christian), > > > > > > that bpf cgroup hook can cause FD leaks when used with sockopts which > > > > > > install FDs into the process fdtable. > > > > > > > > > > > > After some offlist discussion it was proposed to add a blacklist of > > > > > > > > > > We try to replace this word by either denylist or blocklist, even in changelogs. > > > > > > > > Hi Eric, > > > > > > > > Oh, I'm sorry about that. :( Sure. > > > > > > > > > > > > > > > socket options those can cause troubles when BPF cgroup hook is enabled. > > > > > > > > > > > > > > > > Can we find the appropriate Fixes: tag to help stable teams ? > > > > > > > > Sure, I will add next time. > > > > > > > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks") > > > > > > > > I think it's better to add Stanislav Fomichev to CC. > > > > > > Can we use 'struct proto' bpf_bypass_getsockopt instead? We already > > > use it for tcp zerocopy, I'm assuming it should work in this case as > > > well? > > > > Jakub reminded me of the other things I wanted to ask here bug forgot: > > > > - setsockopt is probably not needed, right? setsockopt hook triggers > > before the kernel and shouldn't leak anything > > - for getsockopt, instead of bypassing bpf completely, should we instead > > ignore the error from the bpf program? that would still preserve > > the observability aspect > > stealing this thread to discuss the optlen issue which may make sense to > bypass also. > > There has been issue with optlen. Other than this older post related to > optlen > PAGE_SIZE: > https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@xxxxxxxxx/, > the recent one related to optlen that we have seen is > NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel > put the expected optlen (> 0) and 'return 0;' to userspace. The userspace > intention is to learn the expected optlen. This makes 'ctx.optlen > > max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning > -EFAULT to the userspace even the bpf prog has not changed anything. (ignoring -EFAULT issue) this seems like it needs to be if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) { /* error */ } ? > Does it make sense to also bypass the bpf prog when 'ctx.optlen > > max_optlen' for now (and this can use a separate patch which as usual > requires a bpf selftests)? Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something seems like the way to go. It caused too much trouble already :-( Should I prepare a patch or do you want to take a stab at it? > In the future, does it make sense to have a specific cgroup-bpf-prog (a > specific attach type?) that only uses bpf_dynptr kfunc to access the optval > such that it can enforce read-only for some optname and potentially also > track if bpf-prog has written a new optval? The bpf-prog can only return 1 > (OK) and only allows using bpf_set_retval() instead. Likely there is still > holes but could be a seed of thought to continue polishing the idea. Ack, let's think about it. Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea as well? If we can have a sleepable hook that can copy_from_user/copy_to_user, and we have a mostly working bpf_getsockopt (after your refactoring), I don't see why we need to continue the current scheme of triggering after the kernel? > > - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that > > gets called whenever bpf returns an error to make sure protocols have > > a chance to handle that condition (and free the fd) > > > >