On Sat, 11 May 2019 at 01:17, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > On Fri, May 10, 2019 at 11:34:41AM +0100, Joao Martins wrote: > > On 5/10/19 10:54 AM, Wanpeng Li wrote: > > > It is weird that we can observe intel_idle driver in the guest > > > executes mwait eax=0x20, and the corresponding pCPU enters C3 on HSW > > > server, however, we can't observe this on SKX/CLX server, it just > > > enters maximal C1. > > > > I assume you refer to the case where you pass the host mwait substates to the > > guests as is, right? Or are you zeroing/filtering out the mwait cpuid leaf EDX > > like my patch (attached in the previous message) suggests? > > > > Interestingly, hints set to 0x20 actually corresponds to C6 on HSW (based on > > intel_idle driver). IIUC From the SDM (see Vol 2B, "MWAIT for Power Management" > > in instruction set reference M-U) the hints register, doesn't necessarily > > guarantee the specified C-state depicted in the hints will be used. The manual > > makes it sound like it is tentative, and implementation-specific condition may > > either ignore it or enter a different one. It appears to be only guaranteed that > > it won't enter a C-{sub,}state deeper than the one depicted. > > Yep, section "MWAIT EXTENSIONS FOR ADVANCED POWER MANAGEMENT" is more > explicit on this point: > > At CPL=0, system software can specify desired C-state and sub C-state by > using the MWAIT hints register (EAX). Processors will not go to C-state > and sub C-state deeper than what is specified by the hint register. > > As for why SKX/CLX only enters C1, AFAICT SKX isn't configured to support > C3, e.g. skx_cstates in drivers/idle/intel_idle.c shows C1, C1E and C6. > A quick search brings up a variety of docs that confirm this. My guess is > that C1E provides better power/performance than C3 for the majority of > server workloads, e.g. C3 doesn't provide enough power savings to justify > its higher latency and TLB flush. You are right, I figure this out by referring to the SKX/CLX EDS, the Core C-States of these two generations just support CC0/CC1/CC1E/CC6. The issue here is after exposing mwait to the guest, SKX/CLX guest can't enter CC6, however, HSW guest can enter CC3/CC6. Both HSW and SKX/CLX hosts can enter CC6. We observe SKX/CLX guests execute mwait eax 0x20, however, we can't observe the corresponding pCPU enter CC6 by turbostat or reading MSR_CORE_C6_RESIDENCY directly. Regards, Wanpeng Li