Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > On Thu, Oct 25, 2018 at 11:56:12PM +0900, Taehee Yoo wrote: > > conn_free() holds lock with spin_lock(). and it is called by both > > nf_conncount_lookup() and nf_conncount_gc_list(). > > nf_conncount_lookup() is bottom-half context and nf_conncount_gc_list() > > is process context. so that spin_lock() is not safe. > > Hence conn_free() should use spin_lock_bh() instead of spin_lock(). > > > > test commands: > > %nft add table ip filter > > %nft add chain ip filter input { type filter hook input priority 0\; } > > %nft add rule filter input meter test { ip saddr ct count over 2 } \ > > counter > > > > splat looks like: > > [ 461.996507] ================================ > > [ 461.998999] WARNING: inconsistent lock state > > [ 461.998999] 4.19.0-rc6+ #22 Not tainted > > [ 461.998999] -------------------------------- > > [ 461.998999] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. > > [ 461.998999] kworker/0:2/134 [HC0[0]:SC0[0]:HE1:SE1] takes: > > [ 461.998999] 00000000a71a559a (&(&list->list_lock)->rlock){+.?.}, at: conn_free+0x69/0x2b0 [nf_conncount] > > [ 461.998999] {IN-SOFTIRQ-W} state was registered at: > > [ 461.998999] _raw_spin_lock+0x30/0x70 > > [ 461.998999] nf_conncount_add+0x28a/0x520 [nf_conncount] > > [ 461.998999] nft_connlimit_eval+0x401/0x580 [nft_connlimit] > > [ 461.998999] nft_dynset_eval+0x32b/0x590 [nf_tables] > > [ 461.998999] nft_do_chain+0x497/0x1430 [nf_tables] > > [ 461.998999] nft_do_chain_ipv4+0x255/0x330 [nf_tables] > > [ 461.998999] nf_hook_slow+0xb1/0x160 > > [ ... ] > > [ 461.998999] other info that might help us debug this: > > [ 461.998999] Possible unsafe locking scenario: > > [ 461.998999] > > [ 461.998999] CPU0 > > [ 461.998999] ---- > > [ 461.998999] lock(&(&list->list_lock)->rlock); > > [ 461.998999] <Interrupt> > > [ 461.998999] lock(&(&list->list_lock)->rlock); > > [ 461.998999] > > [ 461.998999] *** DEADLOCK *** > > [ 461.998999] > > [ ... ] > > nf_conncount_add() also holds spin_lock while allocate from there is > GFP_ATOMIC given this is called from packet path. Good catch, yes, this needs spin_lock_bh variant too. > tree_nodes_free() is also called from user context without _bh > disabled. This one is fine, both call sites hold spin_lock_bh(&nf_conncount_locks[x]).