RE: [PATCH] rt: cpufreq: Fix cpu hotplug hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sebastian,

  Sorry for the late response due to bandwidth issue.
  Please see my reply inline.

On Thursday, February 25, 2021 12:38 AM, Sebastian Siewior wrote:
> 
> On 2021-02-23 08:43:08 [+0530], Viresh Kumar wrote:
> > On 23-02-21, 02:26, Ran Wang wrote:
> > > On Monday, February 22, 2021 10:01 PM, Sebastian Siewior wrote:
> > > > On 2021-02-19 16:44:20 [+0800], Ran Wang wrote:
> > > > > When selecting PREEMPT_RT, cpufreq_driver->stop_cpu(policy)
> > > > > might got stuck due to irq_work_sync() pending for work on
> > > > > lazy_list. That’s because lazy_list may have no chance to be
> > > > > served in softirq context sometimes. Below is one of scenarios that was captured:
> > > > >
> > > > > ...
> > > > > ret_from_fork
> > > > >  kthread
> > > > >   smpboot_thread_fn
> > > > >    cpuhp_thread_fun
> > > > >     cpuhp_invoke_callback: state: 193
> > > > >      cpuhp_cpufreq_online
> > > > >       cpufreq_online
> > > > >        cpufreq_driver->stop_cpu(policy);
> > > > >         cpufreq_dbs_governor_stop
> > > > >          sugov_stop  // kernel/sched/cpufreq_schedutil.c
> > > > >           irq_work_sync(&sg_policy->irq_work);
> > > > >
> > > > > This is observed on LX2160ARDB (16 A72 cores) with cpufreq
> > > > > governor of ‘schedutil’ or ‘ondemand’.
> > > >
> > > > While staring at it, why do we invoke schedule_work_on() and
> > > > kthread_queue_work() from inside irq_work() instead invoking it directly? It raises an interrupt in which it kicks a user thread.
> > > > Couldn't we do it without irq_work?
> >
> > Because we reach there from scheduler's context, which must be
> > hard-irq context..
> 
> Oha. Too bad.
> Ran, any chance to figure out why the softirq can not be served?
> ksoftirqd and the timer should be up.

Further debugging shows that sometimes the nearest coming local timer is expected to expire after a long time (100+ seconds).
That’s why function run_local_timers() gets no chance to call raise_softirq(TIMER_SOFTIRQ) to handle enqueued irq_work
for a long timer, causing irq_work_sync() pending when doing CPU hotplug operation.

Below is one of trace I captured, showing that sometimes [timeout=xxx] value would be much bigger than others (see [timeout=30070]).
And we can see on this particular example (CPU2), there would be 100+ seconds idle between 2 'timer_expire_entry' (see timestamp of 613.43867)

     ksoftirqd/2-26      [002] d..s.13    35.046622: timer_expire_entry: timer=000000008f7d5140 function=delayed_work_timer_fn now=4294901034 baseclk=4294900752
     ksoftirqd/2-26      [002] dnLs.13    35.046634: timer_expire_exit: timer=000000008f7d5140
     kworker/2:1-106     [002] d...1..    38.942670: timer_start: timer=000000008f7d5140 function=delayed_work_timer_fn expires=4294902244 [timeout=236] cpu=2 idx=125 flags=D|I
     ksoftirqd/2-26      [002] d..s.13   316.486673: timer_expire_entry: timer=000000008f7d5140 function=delayed_work_timer_fn now=4294971394 baseclk=4294902248
     ksoftirqd/2-26      [002] d..s.13   316.486688: timer_expire_exit: timer=000000008f7d5140
     ksoftirqd/2-26      [002] ...s.13   316.486694: timer_expire_entry: timer=00000000b44e5013 function=idle_worker_timeout now=4294971394 baseclk=4294971392
     ksoftirqd/2-26      [002] .nLs.13   316.486707: timer_expire_exit: timer=00000000b44e5013
   kworker/u16:0-7       [002] d...1..   492.606742: timer_start: timer=000000003fe6a257 function=delayed_work_timer_fn expires=4295045494 [timeout=30070] cpu=2 idx=217 flags=I
   kworker/u16:0-7       [002] d...1..   492.606756: timer_start: timer=000000003b28e768 function=delayed_work_timer_fn expires=4295022994 [timeout=7570] cpu=2 idx=237 flags=D|I
   kworker/u16:0-7       [002] d...113   492.606766: timer_start: timer=0000000062a01a47 function=delayed_work_timer_fn expires=4295019174 [timeout=3750] cpu=2 idx=171 flags=D|I
   kworker/u16:0-7       [002] d...113   492.606776: timer_start: timer=000000000bbd8bb1 function=delayed_work_timer_fn expires=4295019174 [timeout=3750] cpu=2 idx=171 flags=D|I
   kworker/u16:0-7       [002] d...1..   492.606860: timer_start: timer=00000000f1f3a9cb function=delayed_work_timer_fn expires=4295015455 [timeout=31] cpu=2 idx=32 flags=D|I
     ksoftirqd/2-26      [002] d..s.13   613.438671: timer_expire_entry: timer=000000003fe6a257 function=delayed_work_timer_fn now=4295045632 baseclk=4295045632
     ksoftirqd/2-26      [002] d..s.13   613.438695: timer_expire_exit: timer=000000003fe6a257
     ksoftirqd/2-26      [002] d..s.13   613.438703: timer_expire_entry: timer=00000000f1f3a9cb function=delayed_work_timer_fn now=4295045632 baseclk=4295015456
     ksoftirqd/2-26      [002] d..s.13   613.438717: timer_expire_exit: timer=00000000f1f3a9cb
     ksoftirqd/2-26      [002] d..s.13   613.438723: timer_expire_entry: timer=000000000bbd8bb1 function=delayed_work_timer_fn now=4295045632 baseclk=4295019200
     ksoftirqd/2-26      [002] d..s.13   613.438741: timer_expire_exit: timer=000000000bbd8bb1
     ksoftirqd/2-26      [002] d..s.13   613.438745: timer_expire_entry: timer=0000000062a01a47 function=delayed_work_timer_fn now=4295045632 baseclk=4295019200
     ksoftirqd/2-26      [002] d..s.13   613.438756: timer_expire_exit: timer=0000000062a01a47
     ksoftirqd/2-26      [002] d..s.13   613.438762: timer_expire_entry: timer=000000003b28e768 function=delayed_work_timer_fn now=4295045632 baseclk=4295023104

> Any numbers on how often that is irq_work is scheduled?

I add some prints in function run_timer_softirq(), and collect some worst cases (CPU 5 and CPU3) as below,
looks some irq_work might have to wait for long time to get served in such case:

[   13.040774] [005]:run_timer_softirq():call irq_work_tick_soft();
[   17.078788] [005]:run_timer_softirq():call irq_work_tick_soft();
[   19.222878] [005]:run_timer_softirq():call irq_work_tick_soft();
[   49.608798] [005]:run_timer_softirq():call irq_work_tick_soft();
[   56.564773] [005]:run_timer_softirq():call irq_work_tick_soft();
[  124.368772] [005]:run_timer_softirq():call irq_work_tick_soft();
[  124.880821] [005]:run_timer_softirq():call irq_work_tick_soft();
[  125.392842] [005]:run_timer_softirq():call irq_work_tick_soft();
[  129.556828] [005]:run_timer_softirq():call irq_work_tick_soft();
[  132.336839] [005]:run_timer_softirq():call irq_work_tick_soft();
[  253.168839] [005]:run_timer_softirq():call irq_work_tick_soft();
[  376.048841] [005]:run_timer_softirq():call irq_work_tick_soft();

[   13.072774] [003]:run_timer_softirq():call irq_work_tick_soft();
[   13.232774] [003]:run_timer_softirq():call irq_work_tick_soft();
[   14.064773] [003]:run_timer_softirq():call irq_work_tick_soft();
[   15.056773] [003]:run_timer_softirq():call irq_work_tick_soft();
[   16.080773] [003]:run_timer_softirq():call irq_work_tick_soft();
[   17.072772] [003]:run_timer_softirq():call irq_work_tick_soft();
[   17.840774] [003]:run_timer_softirq():call irq_work_tick_soft();
[   18.352775] [003]:run_timer_softirq():call irq_work_tick_soft();
[   18.864774] [003]:run_timer_softirq():call irq_work_tick_soft();
[   19.208796] [003]:run_timer_softirq():call irq_work_tick_soft();
[   19.376772] [003]:run_timer_softirq():call irq_work_tick_soft();
[   19.888776] [003]:run_timer_softirq():call irq_work_tick_soft();
[   20.118792] [003]:run_timer_softirq():call irq_work_tick_soft();
[   82.992773] [003]:run_timer_softirq():call irq_work_tick_soft();
[  125.072801] [003]:run_timer_softirq():call irq_work_tick_soft();
[  126.064772] [003]:run_timer_softirq():call irq_work_tick_soft();
[  127.056796] [003]:run_timer_softirq():call irq_work_tick_soft();
[  128.976773] [003]:run_timer_softirq():call irq_work_tick_soft();

Regards,
Ran




[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux