RE: [PATCH v2] rt: cpufreq: Fix cpu hotplug hang

Ran Wang <ran.wang_1@xxxxxxx> · Tue, 30 Mar 2021 05:24:13 +0000

Hi Kumar,

On Tuesday, March 30, 2021 12:46 PM, Viresh Kumar wrote:
> 
> On 30-03-21, 11:15, Ran Wang wrote:
> > When selecting PREEMPT_RT, cpufreq_driver->stop_cpu(policy) might get
> > stuck due to irq_work_sync() pending for work on lazy_list, which had
> > no chance to be served in softirq context sometimes.
> >
> > The reason of lazy_list was not served is because the nearest
> > activated timer might have been set to expire after long time (such as 100+ seconds).
> > Then function run_local_timers() would not call
> > raise_softirq(TIMER_SOFTIRQ) to handle enqueued irq_work.
> >
> > This is observed on LX2160ARDB and LS1088ARDB with cpufreq governor of
> > ‘schedutil’ or ‘ondemand’.
> >
> > Configure related irqwork to run on raw-irq context could fix this issue.
> >
> > Signed-off-by: Ran Wang <ran.wang_1@xxxxxxx>
> > ---
> > Change in v2:
> >  - Update commit message to explain root cause more clear.
> >
> >  drivers/cpufreq/cpufreq_governor.c | 2 +-
> >  kernel/sched/cpufreq_schedutil.c   | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/cpufreq/cpufreq_governor.c
> > b/drivers/cpufreq/cpufreq_governor.c
> > index 63f7c219062b..731a7b1434df 100644
> > --- a/drivers/cpufreq/cpufreq_governor.c
> > +++ b/drivers/cpufreq/cpufreq_governor.c
> > @@ -360,7 +360,7 @@ static struct policy_dbs_info *alloc_policy_dbs_info(struct cpufreq_policy *poli
> >  	policy_dbs->policy = policy;
> >  	mutex_init(&policy_dbs->update_mutex);
> >  	atomic_set(&policy_dbs->work_count, 0);
> > -	init_irq_work(&policy_dbs->irq_work, dbs_irq_work);
> > +	policy_dbs->irq_work = IRQ_WORK_INIT_HARD(dbs_irq_work);
> >  	INIT_WORK(&policy_dbs->work, dbs_work_handler);
> >
> >  	/* Set policy_dbs for all CPUs, online+offline */ diff --git
> > a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 50cbad89f7fa..1d5af87ec92e 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -611,7 +611,7 @@ static int sugov_kthread_create(struct
> > sugov_policy *sg_policy)
> >
> >  	sg_policy->thread = thread;
> >  	kthread_bind_mask(thread, policy->related_cpus);
> > -	init_irq_work(&sg_policy->irq_work, sugov_irq_work);
> > +	sg_policy->irq_work = IRQ_WORK_INIT_HARD(sugov_irq_work);
> >  	mutex_init(&sg_policy->work_lock);
> >
> >  	wake_up_process(thread);
> 
> Will this have any impact on the non-preempt-rt case ? Otherwise,

My understanding is, in non-preempt-rt case, it will be queued to raised_list instead
and call arch_irq_work_raise() immediately to raise a IPI to serve. So that it would be
similar to what this patch do in preempt-rt case, see function __irq_work_queue_local():

 53 /* Enqueue on current CPU, work must already be claimed and preempt disabled */  
 54 static void __irq_work_queue_local(struct irq_work *work)                        
 55 {                                                                                
 56         struct llist_head *list;                                                 
 57         bool lazy_work;                                                          
 58         int work_flags;                                                          
 59                                                                                  
 60         work_flags = atomic_read(&work->node.a_flags);                           
 61         if (work_flags & IRQ_WORK_LAZY)                                          
 62                 lazy_work = true;                                                
 63         else if (IS_ENABLED(CONFIG_PREEMPT_RT) &&                                
 64                 !(work_flags & IRQ_WORK_HARD_IRQ))                                
 65                         lazy_work = true;                                        
 66         else                                                                     
 67                 lazy_work = false;

And I have tested on mainline and rt tree with CONFIG_PREEMPT selected, couldn't reproduce such issue.

Regards,
Ran

> Acked-by: Viresh Kumar <viresh.kumar@xxxxxxxxxx>
> 
> --
> viresh