On Thu, Feb 17, 2022 at 09:43:30AM -0800, Paul E. McKenney wrote: > On Thu, Feb 17, 2022 at 04:25:19PM +0100, Padmanabha Srinivasaiah wrote: > > While booting secondary CPUs, cpus_read_[lock/unlock] not keeping online > > cpumask stable. The transient online mask results in below calltrace. > > > > [ 0.324121] CPU1: Booted secondary processor 0x0000000001 [0x410fd083] > > [ 0.346652] Detected PIPT I-cache on CPU2 > > [ 0.347212] CPU2: Booted secondary processor 0x0000000002 [0x410fd083] > > [ 0.377255] Detected PIPT I-cache on CPU3 > > [ 0.377823] CPU3: Booted secondary processor 0x0000000003 [0x410fd083] > > [ 0.379040] ------------[ cut here ]------------ > > [ 0.383662] WARNING: CPU: 0 PID: 10 at kernel/workqueue.c:3084 __flush_work+0x12c/0x138 > > [ 0.384850] Modules linked in: > > [ 0.385403] CPU: 0 PID: 10 Comm: rcu_tasks_rude_ Not tainted 5.17.0-rc3-v8+ #13 > > [ 0.386473] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT) > > [ 0.387289] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > [ 0.388308] pc : __flush_work+0x12c/0x138 > > [ 0.388970] lr : __flush_work+0x80/0x138 > > [ 0.389620] sp : ffffffc00aaf3c60 > > [ 0.390139] x29: ffffffc00aaf3d20 x28: ffffffc009c16af0 x27: ffffff80f761df48 > > [ 0.391316] x26: 0000000000000004 x25: 0000000000000003 x24: 0000000000000100 > > [ 0.392493] x23: ffffffffffffffff x22: ffffffc009c16b10 x21: ffffffc009c16b28 > > [ 0.393668] x20: ffffffc009e53861 x19: ffffff80f77fbf40 x18: 00000000d744fcc9 > > [ 0.394842] x17: 000000000000000b x16: 00000000000001c2 x15: ffffffc009e57550 > > [ 0.396016] x14: 0000000000000000 x13: ffffffffffffffff x12: 0000000100000000 > > [ 0.397190] x11: 0000000000000462 x10: ffffff8040258008 x9 : 0000000100000000 > > [ 0.398364] x8 : 0000000000000000 x7 : ffffffc0093c8bf4 x6 : 0000000000000000 > > [ 0.399538] x5 : 0000000000000000 x4 : ffffffc00a976e40 x3 : ffffffc00810444c > > [ 0.400711] x2 : 0000000000000004 x1 : 0000000000000000 x0 : 0000000000000000 > > [ 0.401886] Call trace: > > [ 0.402309] __flush_work+0x12c/0x138 > > [ 0.402941] schedule_on_each_cpu+0x228/0x278 > > [ 0.403693] rcu_tasks_rude_wait_gp+0x130/0x144 > > [ 0.404502] rcu_tasks_kthread+0x220/0x254 > > [ 0.405264] kthread+0x174/0x1ac > > [ 0.405837] ret_from_fork+0x10/0x20 > > [ 0.406456] irq event stamp: 102 > > [ 0.406966] hardirqs last enabled at (101): [<ffffffc0093c8468>] _raw_spin_unlock_irq+0x78/0xb4 > > [ 0.408304] hardirqs last disabled at (102): [<ffffffc0093b8270>] el1_dbg+0x24/0x5c > > [ 0.409410] softirqs last enabled at (54): [<ffffffc0081b80c8>] local_bh_enable+0xc/0x2c > > [ 0.410645] softirqs last disabled at (50): [<ffffffc0081b809c>] local_bh_disable+0xc/0x2c > > [ 0.411890] ---[ end trace 0000000000000000 ]--- > > [ 0.413000] smp: Brought up 1 node, 4 CPUs > > [ 0.413762] SMP: Total of 4 processors activated. > > [ 0.414566] CPU features: detected: 32-bit EL0 Support > > [ 0.415414] CPU features: detected: 32-bit EL1 Support > > [ 0.416278] CPU features: detected: CRC32 instructions > > [ 0.447021] Callback from call_rcu_tasks_rude() invoked. > > [ 0.506693] Callback from call_rcu_tasks() invoked. > > > > Link: https://lore.kernel.org/lkml/20220210184319.25009-1-treasure4paddy@xxxxxxxxx/T/ > > Suggested-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > Signed-off-by: Padmanabha Srinivasaiah <treasure4paddy@xxxxxxxxx> > > Thank you, Padmanabha! I have queued this with additional explanation > in the commit log as shown below. Thanks paul. > > If you have not already tested this by repeatedly rebooting, for > example continuously in an overnight test, could you please do so? > (I am assuming that your system can do an automated reboot loop, so no, > I am not asking you to stay up all night!) > Yes, tested and issue doesn't appear. > Should this somehow recur with larger systems, in which the call starts > with (say) five CPUs online but has trouble with the sixth CPU, there > are several ways to address it: > > 1. Make cpus_read_lock() act as expected. > > 2. Make the first for_each_online_cpu() in schedule_on_each_cpu() > instead be for_each_possible_cpu(), always do the INIT_WORK(), > but invoke schedule_work_on() only if cpu_online(cpu). > Yes, agree and was also suggested by Tejun Heo. Will also submit same, in other thread. > 3. Other ideas here, when and if. > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit f6823834aa2e2e581ca627238fa3ad3e0a727c08 > Author: Padmanabha Srinivasaiah <treasure4paddy@xxxxxxxxx> > Date: Thu Feb 17 16:25:19 2022 +0100 > > rcu-tasks: Fix race in schedule and flush work > > While booting secondary CPUs, cpus_read_[lock/unlock] is not keeping > online cpumask stable. The transient online mask results in below > calltrace. > > [ 0.324121] CPU1: Booted secondary processor 0x0000000001 [0x410fd083] > [ 0.346652] Detected PIPT I-cache on CPU2 > [ 0.347212] CPU2: Booted secondary processor 0x0000000002 [0x410fd083] > [ 0.377255] Detected PIPT I-cache on CPU3 > [ 0.377823] CPU3: Booted secondary processor 0x0000000003 [0x410fd083] > [ 0.379040] ------------[ cut here ]------------ > [ 0.383662] WARNING: CPU: 0 PID: 10 at kernel/workqueue.c:3084 __flush_work+0x12c/0x138 > [ 0.384850] Modules linked in: > [ 0.385403] CPU: 0 PID: 10 Comm: rcu_tasks_rude_ Not tainted 5.17.0-rc3-v8+ #13 > [ 0.386473] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT) > [ 0.387289] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 0.388308] pc : __flush_work+0x12c/0x138 > [ 0.388970] lr : __flush_work+0x80/0x138 > [ 0.389620] sp : ffffffc00aaf3c60 > [ 0.390139] x29: ffffffc00aaf3d20 x28: ffffffc009c16af0 x27: ffffff80f761df48 > [ 0.391316] x26: 0000000000000004 x25: 0000000000000003 x24: 0000000000000100 > [ 0.392493] x23: ffffffffffffffff x22: ffffffc009c16b10 x21: ffffffc009c16b28 > [ 0.393668] x20: ffffffc009e53861 x19: ffffff80f77fbf40 x18: 00000000d744fcc9 > [ 0.394842] x17: 000000000000000b x16: 00000000000001c2 x15: ffffffc009e57550 > [ 0.396016] x14: 0000000000000000 x13: ffffffffffffffff x12: 0000000100000000 > [ 0.397190] x11: 0000000000000462 x10: ffffff8040258008 x9 : 0000000100000000 > [ 0.398364] x8 : 0000000000000000 x7 : ffffffc0093c8bf4 x6 : 0000000000000000 > [ 0.399538] x5 : 0000000000000000 x4 : ffffffc00a976e40 x3 : ffffffc00810444c > [ 0.400711] x2 : 0000000000000004 x1 : 0000000000000000 x0 : 0000000000000000 > [ 0.401886] Call trace: > [ 0.402309] __flush_work+0x12c/0x138 > [ 0.402941] schedule_on_each_cpu+0x228/0x278 > [ 0.403693] rcu_tasks_rude_wait_gp+0x130/0x144 > [ 0.404502] rcu_tasks_kthread+0x220/0x254 > [ 0.405264] kthread+0x174/0x1ac > [ 0.405837] ret_from_fork+0x10/0x20 > [ 0.406456] irq event stamp: 102 > [ 0.406966] hardirqs last enabled at (101): [<ffffffc0093c8468>] _raw_spin_unlock_irq+0x78/0xb4 > [ 0.408304] hardirqs last disabled at (102): [<ffffffc0093b8270>] el1_dbg+0x24/0x5c > [ 0.409410] softirqs last enabled at (54): [<ffffffc0081b80c8>] local_bh_enable+0xc/0x2c > [ 0.410645] softirqs last disabled at (50): [<ffffffc0081b809c>] local_bh_disable+0xc/0x2c > [ 0.411890] ---[ end trace 0000000000000000 ]--- > [ 0.413000] smp: Brought up 1 node, 4 CPUs > [ 0.413762] SMP: Total of 4 processors activated. > [ 0.414566] CPU features: detected: 32-bit EL0 Support > [ 0.415414] CPU features: detected: 32-bit EL1 Support > [ 0.416278] CPU features: detected: CRC32 instructions > [ 0.447021] Callback from call_rcu_tasks_rude() invoked. > [ 0.506693] Callback from call_rcu_tasks() invoked. > > This commit therefore fixes this issue by applying a single-CPU > optimization to the RCU Tasks Rude grace-period process. The key point > here is that the purpose of this RCU flavor is to force a schedule on > each online CPU since some past event. But the rcu_tasks_rude_wait_gp() > function runs in the context of the RCU Tasks Rude's grace-period kthread, > so there must already have been a context switch on the current CPU since > the call to either synchronize_rcu_tasks_rude() or call_rcu_tasks_rude(). > So if there is only a single CPU online, RCU Tasks Rude's grace-period > kthread does not need to anything at all. > > It turns out that the rcu_tasks_rude_wait_gp() function's call to > schedule_on_each_cpu() causes problems during early boot. During that > time, there is only one online CPU, namely the boot CPU. Therefore, > applying this single-CPU optimization fixes early-boot instances of > this problem. > > Link: https://lore.kernel.org/lkml/20220210184319.25009-1-treasure4paddy@xxxxxxxxx/T/ > Suggested-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > Signed-off-by: Padmanabha Srinivasaiah <treasure4paddy@xxxxxxxxx> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > index d73e32d803438..f7fdb349f8b8e 100644 > --- a/kernel/rcu/tasks.h > +++ b/kernel/rcu/tasks.h > @@ -955,6 +955,9 @@ static void rcu_tasks_be_rude(struct work_struct *work) > // Wait for one rude RCU-tasks grace period. > static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp) > { > + if (num_online_cpus() <= 1) > + return; // Fastpath for only one CPU. > + > rtp->n_ipis += cpumask_weight(cpu_online_mask); > schedule_on_each_cpu(rcu_tasks_be_rude); > }