Shawn Bohrer <sbohrer@xxxxxxxxxxxxxx> wrote: > On Wed, Dec 26, 2018 at 02:55:00PM +0100, Pablo Neira Ayuso wrote: > > On Wed, Dec 26, 2018 at 02:41:59PM +0100, Pablo Neira Ayuso wrote: > > > Instead of removing a empty list node that might be reintroduced soon > > > thereafter, tentatively place the empty list node in the garbage > > > collector, then re-check if the list is empty again before deleting it. > > > > > > This patch is aiming to simplify the garbage collection interaction > > > between the packet path and the workqueue to delete empty lists. > > > > Hm, still not good enough. > > > > Workqueue and packet path may race to place the same node in the > > gc_nodes[] array, leading to possible use-after-free. > > Hey Pablo, > > I assume you realize this, but the race can happen with or without > your patch. I think you could fix the race by expanding the spinlock > in tree_gc_worker() to cover the walking of the tree instead of doing > it as a RCU reader. I suggest to split this in two phases, like this: diff --git a/net/netfilter/nf_conncount.c b/net/netfilter/nf_conncount.c --- a/net/netfilter/nf_conncount.c +++ b/net/netfilter/nf_conncount.c @@ -494,16 +494,29 @@ static void tree_gc_worker(struct work_struct *work) for (node = rb_first(root); node != NULL; node = rb_next(node)) { rbconn = rb_entry(node, struct nf_conncount_rb, node); if (nf_conncount_gc_list(data->net, &rbconn->list)) - gc_nodes[gc_count++] = rbconn; + gc_count++; } rcu_read_unlock(); spin_lock_bh(&nf_conncount_locks[tree]); + if (gc_count < ARRAY_SIZE(gc_nodes)) + goto next; /* do not bother */ - if (gc_count) { - tree_nodes_free(root, gc_nodes, gc_count); + gc_count = 0; + for (node = rb_first(root); node != NULL; node = rb_next(node)) { + rbconn = rb_entry(node, struct nf_conncount_rb, node); + if (rbconn->list.count > 0) + continue; + + gc_nodes[gc_count++] = rbconn; + if (gc_count >= ARRAY_SIZE(gc_nodes)) { + tree_nodes_free(root, gc_nodes, gc_count); + gc_count = 0; + } } + tree_nodes_free(root, gc_nodes, gc_count); +next: clear_bit(tree, data->pending_trees); next_tree = (tree + 1) % CONNCOUNT_SLOTS; I think we should also remove list->dead and the 'free_entry' indicator passed via find_or_evict(). They make things way too complicated. Removing a tree node is only possible when we hold the nf_conncount_lock for that tree, so we can simply check for list->count == 0 and evict in that case (from gc phase two), defer to insert_tree() (from rcu insert phase), or just bump its counter back to 1 (from insert_tree()). I still agree with the comments I made to Shawns patches, at the very least removing the extra define for lock count should go, its too easy to mix those two up.