On Mon, 13 Jan 2020 at 20:36, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Mon, Jan 13, 2020 at 12:52:20PM +0100, Paolo Bonzini wrote: > > On 13/01/20 11:43, Peter Zijlstra wrote: > > > So the very first thing we need to get sorted is that MPERF/TSC ratio > > > thing. TurboStat does it, but has 'funny' hacks on like: > > > > > > b2b34dfe4d9a ("tools/power turbostat: KNL workaround for %Busy and Avg_MHz") > > > > > > and I imagine that there's going to be more exceptions there. You're > > > basically going to have to get both Intel and AMD to commit to this. > > > > > > IFF we can get concensus on MPERF/TSC, then yes, that is a reasonable > > > way to detect a VCPU being idle I suppose. I've added a bunch of people > > > who seem to know about this. > > > > > > Anyone, what will it take to get MPERF/TSC 'working' ? > > > > Do we really need MPERF/TSC for this use case, or can we just track > > APERF as well and do MPERF/APERF to compute the "non-idle" time? > > So MPERF runs at fixed frequency (when !IDLE and typically the same > frequency as TSC), APERF runs at variable frequency (when !IDLE) > depending on DVFS state. > > So APERF/MPERF gives the effective frequency of the core, but since both > stop during IDLE, it will not be a good indication of IDLE. > > Otoh, TSC doesn't stop in idle (.oO this depends on > X86_FEATURE_CONSTANT_TSC) and therefore the MPERF/TSC ratio gives how > much !idle time there was between readings. Do you have a better solution to penalty vCPU process which mwait/hlt executed inside? :) Wanpeng