On Thu, Jul 04, 2019 at 06:57:26PM -0700, Paul E. McKenney wrote: [snip] > > I tried again, if I make sure the ftrace dump absolutely does not happen > > until the preempt-disable loop is done marked by a new global variable as you > > pointed, then it fixes it. And I don't need any set_preempt_need_resched() or > > rcu_perf_shutdown_wait() in my preempt disable loop to fix it. Basically the > > below diff. However, it still does answer the question about why a parallel > > ftrace dump running in parallel with the still running preempt-disable loop > > caused some writers to have multi-second grace periods. I think something > > during the ftrace dump prevented the tick path of that loop CPU to set the > > need-resched flag. It is quite hard to trace because the problem itself is > > caused by tracing, so by the time the dump starts, the traces cannot be seen > > after that which are what would give a clue here. > > Hmmm... Doesn't ftrace_dump() iterate through the trace buffer with > interrupts disabled or some such? If so, that would fully explain > its delaying RCU grace periods. Looking through the ftrace_dump() code, I don't see any interrupt disabled happening, and in this case it would be happening on a different CPU than my preempt disable loop anyway since that loop runs on a CPU I reserved, and the writer thread doing the dump runs on a different CPU. So it is a bit odd that the presence of my preempt disable loop effects anything. No having the preempt disable loop in the first place, does not have this issue. (Also added "attn: Steve" for the tracing question, to get his attention since this thread is very long). Steven, any thoughts on how rcu_ftrace_dump() can affect grace-period durations or other RCU parts? Do you see how it could impact the RCU GP thread if at all? I did setup RT priority 10 for the thread. thanks, - Joel