On Fri, Sep 02, 2022 at 01:52:28PM +0200, Sascha Hauer wrote: > On Thu, Sep 01, 2022 at 10:33:04AM -0700, Paul E. McKenney wrote: > > On Thu, Sep 01, 2022 at 10:27:25AM -0700, Paul E. McKenney wrote: > > > On Thu, Sep 01, 2022 at 12:36:25PM +0200, Sascha Hauer wrote: > > > > Hi Paul, > > > > > > > > On Mon, Jun 20, 2022 at 03:53:43PM -0700, Paul E. McKenney wrote: > > > > > This commit causes synchronous grace periods to be driven from the task > > > > > invoking synchronize_rcu_*(), allowing these functions to be invoked from > > > > > the mid-boot dead zone extending from when the scheduler was initialized > > > > > to to point that the various RCU tasks grace-period kthreads are spawned. > > > > > This change will allow the self-tests to run in a consistent manner. > > > > > > > > > > Reported-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > > > > Reported-by: Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> > > > > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > > > > > > > This commit (appeared in mainline as 4a8cc433b8bf) breaks booting my > > > > ARMv7 based i.MX6ul board when CONFIG_PROVE_RCU is enabled. Reverting > > > > this patch on v6.0-rc3 makes my board boot again. See below for a boot > > > > log. The last message is "Running RCU-tasks wait API self tests", after > > > > that the board hangs. Any idea what goes wrong here? > > > > > > New one on me! > > > > > > Is it possible to get a stack trace of the hang, perhaps via > > > one form or another of sysrq-T? Such a stack trace would likely > > > include synchronize_rcu_tasks(), synchronize_rcu_tasks_rude(), or > > > synchronize_rcu_tasks_trace() followed by synchronize_rcu_tasks_generic(), > > > rcu_tasks_one_gp(), and one of rcu_tasks_wait_gp(), > > > rcu_tasks_rude_wait_gp(), or rcu_tasks_wait_gp(). > > > > If there is no chance of sysrq-T, kernel debuggers, kernel crash > > dumps, or any other source of the stack trace, please decorate the > > code path with printk() or similar and let me know where it goes. > > Under normal circumstances, this code path is not sensitive to performance > > perturbations of the printk() persuasion. > > Some unrelated bug I was searching for made me turn on early console > output with the "earlycon" parameter. It turned out that when I remove > this parameter then my board boots fine. > > I then realized that even with earlycon enabled my board boots fine > when I remove the call to > > pr_info("Running RCU-tasks wait API self tests\n"); Ah, there are some printk() fixes in the works. Maybe this is one area needed that. Or maybe not. > Given that I am not sure how useful it is to add more printk. I did that > anyway like this: > > > static void rcu_tasks_one_gp(struct rcu_tasks *rtp, bool midboot) > > { > > int needgpcb; > > > > printk("%s: mutex_lock... midboot: %d\n", __func__, midboot); > > mutex_lock(&rtp->tasks_gp_mutex); > > printk("%s: mutex_locked midboot: %d\n", __func__, midboot); > > > > // If there were none, wait a bit and start over. > > if (unlikely(midboot)) { > > needgpcb = 0x2; > > } else { > > printk("%s: set_tasks_gp_state(RTGS_WAIT_CBS)...\n", __func__); > > set_tasks_gp_state(rtp, RTGS_WAIT_CBS); > > printk("%s: rcuwait_wait_event...\n", __func__); > > rcuwait_wait_event(&rtp->cbs_wait, > > (needgpcb = rcu_tasks_need_gpcb(rtp)), > > TASK_IDLE); > > printk("%s: rcuwait_wait_event done\n", __func__); > > } > > > > What I see then is: > > [ 0.156362] synchronize_rcu_tasks_generic: rcu_tasks_one_gp.... > [ 0.162087] rcu_tasks_one_gp: mutex_lock... midboot: 1 So one task gets stuck either in mutex_lock() or the printk() above and some other task below moves ahead? Or might some printk()s have been lost? > [ 0.167386] rcu_tasks_one_gp: mutex_lock... midboot: 0 > [ 0.172489] rcu_tasks_one_gp: mutex_locked midboot: 0 > [ 0.177535] rcu_tasks_one_gp: set_tasks_gp_state(RTGS_WAIT_CBS)... > [ 0.183525] rcu_tasks_one_gp: rcuwait_wait_event... Given that everything works with printk()s turned off, my current suspicion is a printk() issue. > Here the board hangs. After some time I get: > > [ 254.493010] random: crng init done This looks unrelated. > But that's it. > > > > > > At this point in the boot sequence, there is only one online CPU, > > > correct? > > Yes, it's a single core system. OK, then we should be able to rule out SMP issues. ;-) Thanx, Paul > Sascha > > -- > Pengutronix e.K. | | > Steuerwalder Str. 21 | http://www.pengutronix.de/ | > 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |