On Tue, 31 Jul 2018 at 05:32, Wanpeng Li <kernellwp@xxxxxxxxx> wrote: > > > > > > > > #if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) > > > > if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY)) > > > > - sched_rt_avg_update(rq, irq_delta + steal); > > > > + update_irq_load_avg(rq, irq_delta + steal); > > > > > > I think we should not add steal time into irq load tracking, steal > > > time is always 0 on native kernel which doesn't matter, what will > > > happen when guest disables IRQ_TIME_ACCOUNTING and enables > > > PARAVIRT_TIME_ACCOUNTING? Steal time is not the real irq util_avg. In > > > addition, we haven't exposed power management for performance which > > > means that e.g. schedutil governor can not cooperate with passive mode > > > intel_pstate driver to tune the OPP. To decay the old steal time avg > > > and add the new one just wastes cpu cycles. > > > > In fact, I have kept the same behavior as with rt_avg, which was > > already adding steal time when computing scale_rt_capacity, which is > > used to reflect the remaining capacity for FAIR tasks and is used in > > load balance. I'm not sure that it's worth using different variables > > for irq and steal. > > That being said, I see a possible optimization in schedutil when > > PARAVIRT_TIME_ACCOUNTING is enable and IRQ_TIME_ACCOUNTING is disable. > > With this kind of config, scale_irq_capacity can be a nop for > > schedutil but scales the utilization for scale_rt_capacity > > Yeah, this is what in my mind before, you can make a patch for that. :) ok, I'm going to prepare a patch Thanks > > Regards, > Wanpeng Li