Hillf Danton <hdanton@xxxxxxxx> wrote: > On Mon, 8 Jul 2024 13:58:31 +0200 Florian Westphal <fw@xxxxxxxxx> > > Hillf Danton <hdanton@xxxxxxxx> wrote: > > > On Sun, 7 Jul 2024 10:08:24 +0200 Florian Westphal <fw@xxxxxxxxx> > > > > Hillf Danton <hdanton@xxxxxxxx> wrote: > > > > > > I think this change might be useful as it also documents > > > > > > this requirement. > > > > > > > > > > Yes it is boy and the current reproducer triggered another warning [1,2]. > > > > > > > > > > [1] https://lore.kernel.org/lkml/20240706231332.3261-1-hdanton@xxxxxxxx/ > > > > > > > > The WARN is incorrect. The destroy list can be non-empty; i already > > > > tried to explain why. > > > > > > > That warning as-is could be false positive but it could be triggered with a > > > single netns. > > > > How? > > > You saw the below cpu diagram, no? It did not explain the problem in a way I understand. > cpu1 cpu2 cpu3 > --- --- --- > nf_tables_trans_destroy_work() > spin_lock(&nf_tables_destroy_list_lock); > > // 1) clear the destroy list > list_splice_init(&nf_tables_destroy_list, &head); > spin_unlock(&nf_tables_destroy_list_lock); This means @work is running on cpu3 and made a snapshot of the list. I don't even understand how thats relevant, but OK. > nf_tables_commit_release() > spin_lock(&nf_tables_destroy_list_lock); > // 2) refill the destroy list > list_splice_tail_init(&nft_net->commit_list, &nf_tables_destroy_list); > spin_unlock(&nf_tables_destroy_list_lock); > schedule_work(&trans_destroy_work); > mutex_unlock(&nft_net->commit_mutex); Means CPU2 has added transaction structures that could reference @table to list. It also called schedule_work BEFORE releasing the mutex and after placing entries on destroy list. > nft_rcv_nl_event() > mutex_lock(&nft_net->commit_mutex); > flush_work(&trans_destroy_work); Means cpu1 serializes vs. cpu2, @work was scheduled. flush_work() must only return if @work is idle, without any other pending execution. If it gets scheduled again right after flush_work returns that is NOT a problem, as I tried to explain several times. We hold the transaction mutex, only a different netns can queue more work, and such foreign netns can only see struct nft_table structures that are private to their namespaces. > // 3) flush work ends with the refilled destroy list left intact > tear tables down Again, I do not understand how its possible. The postcondition after flush_work returns is: 1. nf_tables_destroy_list must be empty, UNLESS its from unrelated net namespaces, they cannot see the tables we're tearing down in 3), so they cannot reference them. 2. nf_tables_trans_destroy_work() is NOT running, unless its processing entries queued by other netns, after flush work returned. cpu2 does: -> add trans->table to @nf_tables_destroy_list -> unlock list spinlock -> schedule_work -> unlock mutex cpu1 does: -> lock mutex -> flush work You say its not enough and that trans->table queued by cpu2 can still be on @nf_tables_destroy_list. I say flush_work after taking the mutex guarantees strans->table has been processed by @work in all cases.