On Sat, Jul 06, 2019 at 08:18:07AM -0400, Joel Fernandes wrote: > On Thu, Jul 04, 2019 at 06:57:26PM -0700, Paul E. McKenney wrote: > [snip] > > > I tried again, if I make sure the ftrace dump absolutely does not happen > > > until the preempt-disable loop is done marked by a new global variable as you > > > pointed, then it fixes it. And I don't need any set_preempt_need_resched() or > > > rcu_perf_shutdown_wait() in my preempt disable loop to fix it. Basically the > > > below diff. However, it still does answer the question about why a parallel > > > ftrace dump running in parallel with the still running preempt-disable loop > > > caused some writers to have multi-second grace periods. I think something > > > during the ftrace dump prevented the tick path of that loop CPU to set the > > > need-resched flag. It is quite hard to trace because the problem itself is > > > caused by tracing, so by the time the dump starts, the traces cannot be seen > > > after that which are what would give a clue here. > > > > Hmmm... Doesn't ftrace_dump() iterate through the trace buffer with > > interrupts disabled or some such? If so, that would fully explain > > its delaying RCU grace periods. > > Looking through the ftrace_dump() code, I don't see any interrupt disabled > happening, and in this case it would be happening on a different CPU than my > preempt disable loop anyway since that loop runs on a CPU I reserved, and the > writer thread doing the dump runs on a different CPU. So it is a bit odd that > the presence of my preempt disable loop effects anything. No having the > preempt disable loop in the first place, does not have this issue. > > (Also added "attn: Steve" for the tracing question, to get his attention > since this thread is very long). > Steven, any thoughts on how rcu_ftrace_dump() can affect grace-period > durations or other RCU parts? Do you see how it could impact the RCU GP > thread if at all? I did setup RT priority 10 for the thread. I see a local_irq_save() a few lines into ftrace_dump() itself. Am I missing where interrupts are being re-enabled prior to the trace-dump loop? Thanx, Paul