On Mon, 12 Aug 2019 at 14:28, Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > [...] > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > index 59b57d708697..c3447bad608a 100644 > > --- a/net/xdp/xsk.c > > +++ b/net/xdp/xsk.c > > @@ -362,6 +362,50 @@ static void xsk_unbind_dev(struct xdp_sock *xs) > > dev_put(dev); > > } > > > > +static struct xsk_map *xsk_get_map_list_entry(struct xdp_sock *xs, > > + struct xdp_sock ***map_entry) > > +{ > > + struct xsk_map *map = NULL; > > + struct xsk_map_node *node; > > + > > + *map_entry = NULL; > > + > > + spin_lock_bh(&xs->map_list_lock); > > + node = list_first_entry_or_null(&xs->map_list, struct xsk_map_node, > > + node); > > + if (node) { > > + WARN_ON(xsk_map_inc(node->map)); > > Can you elaborate on the refcount usage here and against what scenario it is protecting? > Thanks for having a look! First we access the map_list (under the lock) and pull out the map which we intend to clean. In order to clear the map entry, we need to a reference to the map. However, when the map_list_lock is released, there's a window where the map entry can be cleared and the map can be destroyed, and making the "map", which is used in xsk_delete_from_maps, stale. To guarantee existence the additional refinc is required. Makes sense? > Do we pretend it never fails on the bpf_map_inc() wrt the WARN_ON(), > why that (what makes it different from the xsk_map_node_alloc() inc > above where we do error out)? Hmm, given that we're in a cleanup (socket release), we can't really return any error. What would be a more robust way? Retrying? AFAIK the release ops return an int, but it's not checked/used. > > + map = node->map; > > + *map_entry = node->map_entry; > > + } > > + spin_unlock_bh(&xs->map_list_lock); > > + return map; > > +} > > + > > +static void xsk_delete_from_maps(struct xdp_sock *xs) > > +{ > > + /* This function removes the current XDP socket from all the > > + * maps it resides in. We need to take extra care here, due to > > + * the two locks involved. Each map has a lock synchronizing > > + * updates to the entries, and each socket has a lock that > > + * synchronizes access to the list of maps (map_list). For > > + * deadlock avoidance the locks need to be taken in the order > > + * "map lock"->"socket map list lock". We start off by > > + * accessing the socket map list, and take a reference to the > > + * map to guarantee existence. Then we ask the map to remove > > + * the socket, which tries to remove the socket from the > > + * map. Note that there might be updates to the map between > > + * xsk_get_map_list_entry() and xsk_map_try_sock_delete(). > > + */ I tried to clarify here, but I obviously need to do a better job. :-) Björn