On Thu, Jan 18, 2024 at 12:08 PM Paolo Abeni <pabeni@xxxxxxxxxx> wrote: > > On Wed, 2024-01-17 at 17:28 +0100, Eric Dumazet wrote: > > On Wed, Jan 17, 2024 at 5:23 PM Jozsef Kadlecsik <kadlec@xxxxxxxxxxxxx> wrote: > > > > > > Hi, > > > > > > On Wed, 17 Jan 2024, Eric Dumazet wrote: > > > > > > > On Wed, Jan 17, 2024 at 5:00 PM Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > > > > > > > > > From: Jozsef Kadlecsik <kadlec@xxxxxxxxxxxxx> > > > > > > > > > > The patch "netfilter: ipset: fix race condition between swap/destroy > > > > > and kernel side add/del/test", commit 28628fa9 fixes a race condition. > > > > > But the synchronize_rcu() added to the swap function unnecessarily slows > > > > > it down: it can safely be moved to destroy and use call_rcu() instead. > > > > > Thus we can get back the same performance and preventing the race condition > > > > > at the same time. > > > > > > > > ... > > > > > > > > > > > > > > @@ -2357,6 +2369,9 @@ ip_set_net_exit(struct net *net) > > > > > > > > > > inst->is_deleted = true; /* flag for ip_set_nfnl_put */ > > > > > > > > > > + /* Wait for call_rcu() in destroy */ > > > > > + rcu_barrier(); > > > > > + > > > > > nfnl_lock(NFNL_SUBSYS_IPSET); > > > > > for (i = 0; i < inst->ip_set_max; i++) { > > > > > set = ip_set(inst, i); > > > > > -- > > > > > 2.30.2 > > > > > > > > > > > > > If I am reading this right, time for netns dismantles will increase, > > > > even for netns not using ipset > > > > > > > > If there is no other option, please convert "struct pernet_operations > > > > ip_set_net_ops".exit to an exit_batch() handler, > > > > to at least have a factorized rcu_barrier(); > > > > > > You are right, the call to rcu_barrier() can safely be moved to > > > ip_set_fini(). I'm going to prepare a new version of the patch. > > > > > > Thanks for catching it. > > > > I do not want to hold the series, your fix can be built as another > > patch on top of this one. > > Given the timing, if we merge this series as is, it could go very soon > into Linus' tree. I think it would be better to avoid introducing known > regressions there. > > Any strong opinions vs holding this series until the problems are > fixed? Likely a new PR will be required. > If this helps, here is one splat (using linux-next next-20240118) BUG: sleeping function called from invalid context at kernel/workqueue.c:3348 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 22194, name: syz-executor.0 preempt_count: 101, expected: 0 RCU nest depth: 0, expected: 0 3 locks held by syz-executor.0/22194: #0: ffff8880162a2420 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x12f/0x260 fs/read_write.c:643 #1: ffff888042b48960 (&sb->s_type->i_mutex_key#12){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:802 [inline] #1: ffff888042b48960 (&sb->s_type->i_mutex_key#12){+.+.}-{3:3}, at: shmem_file_write_iter+0x8c/0x140 mm/shmem.c:2883 #2: ffffffff8d5aeb00 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline] #2: ffffffff8d5aeb00 (rcu_callback){....}-{0:0}, at: rcu_do_batch kernel/rcu/tree.c:2152 [inline] #2: ffffffff8d5aeb00 (rcu_callback){....}-{0:0}, at: rcu_core+0x7cc/0x16b0 kernel/rcu/tree.c:2433 Preemption disabled at: [<ffffffff813c061e>] unwind_next_frame+0xce/0x2390 arch/x86/kernel/unwind_orc.c:479 CPU: 0 PID: 22194 Comm: syz-executor.0 Not tainted 6.7.0-next-20240118-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x125/0x1b0 lib/dump_stack.c:106 __might_resched+0x3c0/0x5e0 kernel/sched/core.c:10176 start_flush_work kernel/workqueue.c:3348 [inline] __flush_work+0x11f/0xa20 kernel/workqueue.c:3410 __cancel_work_timer+0x3f3/0x590 kernel/workqueue.c:3498 hash_ipmac6_destroy+0x337/0x420 net/netfilter/ipset/ip_set_hash_gen.h:454 ip_set_destroy_set+0x65/0x100 net/netfilter/ipset/ip_set_core.c:1180 rcu_do_batch kernel/rcu/tree.c:2158 [inline] rcu_core+0x828/0x16b0 kernel/rcu/tree.c:2433 __do_softirq+0x218/0x8de kernel/softirq.c:553 invoke_softirq kernel/softirq.c:427 [inline] __irq_exit_rcu kernel/softirq.c:632 [inline] irq_exit_rcu+0xb9/0x120 kernel/softirq.c:644 sysvec_apic_timer_interrupt+0x95/0xb0 arch/x86/kernel/apic/apic.c:1076 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:649 RIP: 0010:on_stack arch/x86/include/asm/stacktrace.h:59 [inline]