Jakub Sitnicki wrote: > On Tue, Mar 10, 2020 at 06:47 PM CET, Lorenz Bauer wrote: > > Allow callers with CAP_NET_ADMIN to retrieve file descriptors from a > > sockmap and sockhash. O_CLOEXEC is enforced on all fds. > > > > Without this, it's difficult to resize or otherwise rebuild existing > > sockmap or sockhashes. > > > > Suggested-by: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx> > > Signed-off-by: Lorenz Bauer <lmb@xxxxxxxxxxxxxx> > > --- > > net/core/sock_map.c | 19 +++++++++++++++++++ > > 1 file changed, 19 insertions(+) > > > > diff --git a/net/core/sock_map.c b/net/core/sock_map.c > > index 03e04426cd21..3228936aa31e 100644 > > --- a/net/core/sock_map.c > > +++ b/net/core/sock_map.c > > @@ -347,12 +347,31 @@ static void *sock_map_lookup(struct bpf_map *map, void *key) > > static int __sock_map_copy_value(struct bpf_map *map, struct sock *sk, > > void *value) > > { > > + struct file *file; > > + int fd; > > + > > switch (map->value_size) { > > case sizeof(u64): > > sock_gen_cookie(sk); > > *(u64 *)value = atomic64_read(&sk->sk_cookie); > > return 0; > > > > + case sizeof(u32): > > + if (!capable(CAP_NET_ADMIN)) > > + return -EPERM; > > + > > + fd = get_unused_fd_flags(O_CLOEXEC); > > + if (unlikely(fd < 0)) > > + return fd; > > + > > + read_lock_bh(&sk->sk_callback_lock); > > + file = get_file(sk->sk_socket->file); > > I think this deserves a second look. > > We don't lock the sock, so what if tcp_close orphans it before we enter > this critical section? Looks like sk->sk_socket might be NULL. > > I'd find a test that tries to trigger the race helpful, like: > > thread A: loop in lookup FD from map > thread B: loop in insert FD into map, close FD Agreed, this was essentially my question above as well. When the psock is created we call sock_hold() and will only do a sock_put() after an rcu grace period when its removed. So at least if you have the sock here it should have a sk_refcnt. (Note the user data is set to NULL so if you do reference psock you need to check its non-null.) Is that enough to ensure sk_socket? Seems not to me, tcp_close for example will still happen and call sock_orphan(sk) based on my admittddly quick look. Further, even if you do check sk->sk_socket is non-null what does it mean to return a file with a socket that is closed, deleted from the sock_map and psock removed? At this point is it just a dangling reference? Still a bit confused as well what would or should happen when the sock is closed after you have the file reference? I could probably dig up what exactly would happen but I think we need it in the commiit message so we understand it. I also didn't dig up the details here but if the receiver of the fd crashes or otherwise disappears this hopefully all get cleaned up? > > > + read_unlock_bh(&sk->sk_callback_lock); > > + > > + fd_install(fd, file); > > + *(u32 *)value = fd; > > + return 0; > > + > > default: > > return -ENOSPC; > > }