On 05/06/2019 12:04 PM, Daniel Borkmann wrote: > On 05/04/2019 06:06 PM, Björn Töpel wrote: >> From: Björn Töpel <bjorn.topel@xxxxxxxxx> >> >> When an AF_XDP socket is released/closed the XSKMAP still holds a >> reference to the socket in a "released" state. The socket will still >> use the netdev queue resource, and block newly created sockets from >> attaching to that queue, but no user application can access the >> fill/complete/rx/tx rings. This results in that all applications need >> to explicitly clear the map entry from the old "zombie state" >> socket. This should be done automatically. >> >> After this patch, when a socket is released, it will remove itself >> from all the XSKMAPs it resides in, allowing the socket application to >> remove the code that cleans the XSKMAP entry. >> >> This behavior is also closer to that of SOCKMAP, making the two socket >> maps more consistent. >> >> Reported-by: Bruce Richardson <bruce.richardson@xxxxxxxxx> >> Signed-off-by: Björn Töpel <bjorn.topel@xxxxxxxxx> > [...] > > >> +static void __xsk_map_delete_elem(struct xsk_map *map, >> + struct xdp_sock **map_entry) >> +{ >> + struct xdp_sock *old_xs; >> + >> + spin_lock_bh(&map->lock); >> + old_xs = xchg(map_entry, NULL); >> + if (old_xs) >> + xsk_map_del_node(old_xs, map_entry); >> + spin_unlock_bh(&map->lock); >> + >> +} >> + >> static void xsk_map_free(struct bpf_map *map) >> { >> struct xsk_map *m = container_of(map, struct xsk_map, map); >> @@ -78,15 +142,16 @@ static void xsk_map_free(struct bpf_map *map) >> bpf_clear_redirect_map(map); >> synchronize_net(); >> >> + spin_lock_bh(&m->lock); >> for (i = 0; i < map->max_entries; i++) { >> + struct xdp_sock **entry = &m->xsk_map[i]; >> struct xdp_sock *xs; >> >> - xs = m->xsk_map[i]; >> - if (!xs) >> - continue; >> - >> - sock_put((struct sock *)xs); >> + xs = xchg(entry, NULL); >> + if (xs) >> + __xsk_map_delete_elem(m, entry); >> } >> + spin_unlock_bh(&m->lock); >> > > Was this tested? Doesn't the above straight run into a deadlock? > > From xsk_map_free() you iterate over the map with m->lock held. Once you > xchg'ed the entry and call into __xsk_map_delete_elem(), you attempt to > call map->lock on the same map once again. What am I missing? (It also does the xchg() twice so we'd leak the xs since it's NULL in the second one.) > Thanks, > Daniel