On Fri, Feb 14, 2025 at 7:37 PM Petr Mladek <pmladek@xxxxxxxx> wrote: > > On Fri 2025-02-14 00:36:03, Josh Poimboeuf wrote: > > On Fri, Feb 14, 2025 at 10:44:59AM +0800, Yafang Shao wrote: > > > The longest duration of klp_try_complete_transition() ranges from 8.5 > > > to 17.2 seconds. > > > > > > It appears that the RCU stall is not only driven by num_processes * > > > average_klp_try_switch_task, but also by contention within > > > klp_try_complete_transition(), particularly around the tasklist_lock. > > > Interestingly, even after replacing "read_lock(&tasklist_lock)" with > > > "rcu_read_lock()", the RCU stall persists. My verification shows that > > > the only way to prevent the stall is by checking need_resched() during > > > each iteration of the loop. > > > > I'm confused... rcu_read_lock() shouldn't cause any contention, right? > > So if klp_try_switch_task() isn't the problem, then what is? > > I agree that it does not make much sense. I'm confused too and trying to understand it better. > > > I wonder if those function timings might be misleading. If > > klp_try_complete_transition() gets preempted immediately when it > > releases the lock, it could take a while before it eventually returns. > > So that funclatency might not be telling the whole story. > > The scheduling might be an explanation. > > > Though 8.5 - 17.2 seconds is a bit excessive... > > If klp_try_complete_transition() scheduled out and we see this delay > then the system likely had a pretty high load at the moment. > Is it possible? It appears to be workload-related. The RCU warning occurred at specific time periods, likely due to certain workloads running at those times, though I haven't confirmed it yet. > > Yafang, just to be sure. Have you seen these numbers with > the original klp_try_complete_transition() code and with debug > messages disabled? Right. These RCU warnings appeared on our production servers without any debugging enabled, and klp_try_complete_transition() hasn't changed either. > > Or did you saw them with some extra debugging code or other > modifications? No, these are the default production settings as they originally were. > > Also just to be sure. Is this on bare metal? Yes. > > Finally, what preemption mode are you using? Which CONFIG_PREEMPT*? The preemption configuration is as follows: CONFIG_PREEMPT_BUILD=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_PREEMPT_COUNT=y CONFIG_PREEMPTION=y CONFIG_PREEMPT_DYNAMIC=y CONFIG_PREEMPT_RCU=y CONFIG_HAVE_PREEMPT_DYNAMIC=y CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y CONFIG_PREEMPT_NOTIFIERS=y # CONFIG_DEBUG_PREEMPT is not set CONFIG_PREEMPTIRQ_TRACEPOINTS=y # CONFIG_PREEMPT_TRACER is not set # CONFIG_PREEMPTIRQ_DELAY_TEST is not set > PS: JFYI, I have vacation the following week and won't have > access to mails... Enjoy your holiday -- Regards Yafang