On 5/10/19 10:54 AM, Wanpeng Li wrote: > Hi all, > On Wed, 17 Apr 2019 at 03:18, Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >> >> On 4/16/19 4:40 PM, Liran Alon wrote: >>>> On 16 Apr 2019, at 18:21, Liran Alon <liran.alon@xxxxxxxxxx> wrote: >>>>> On 15 Apr 2019, at 21:17, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: >>>>> On Mon, Apr 15, 2019 at 06:45:26PM +0300, Liran Alon wrote: >>>>> >>>>> Technically, I think this is a Qemu bug. KVM reports all zeros for >>>>> CPUID_MWAIT_LEAF when userspace queries KVM_GET_SUPPORTED_CPUID and >>>>> KVM_GET_EMULATED_CPUID. And I think that's correct/desired, supporting >>>>> MONITOR/MWAIT sub-features should be a separate enabling patch set. >>>> >>>> At some point in time Jim added commit df9cb9cc5bcd ("kvm: x86: Amend the KVM_GET_SUPPORTED_CPUID API documentation”) >>>> which added the following paragraph to documentation: >>>> "Note that certain capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may >>>> expose cpuid features (e.g. MONITOR) which are not supported by kvm in >>>> its default configuration. If userspace enables such capabilities, it >>>> is responsible for modifying the results of this ioctl appropriately.” >>>> >>>> It’s indeed not clear what it means to “modify the results of this ioctl *appropriately*” right? >>>> It can either mean you just expose in CPUID[EAX=1].ECX support for MONITOR/MWAIT >>>> or that you also expose CPUID_MWAIT_LEAF (CPUID[EAX=5]). >>>> Both regardless of the value returned from KVM_GET_SUPPORTED_CPUID ioctl. >>>> >>>> Having said that, I tend to agree with you. >>>> Instead of emulating this MSR in KVM, I think it it preferred to change QEMU to expose MONITOR/MWAIT support in CPUID[EAX=1].ECX >>>> but in CPUID[EAX=5] init everything as in host besides ECX[0] which will be set to 0 to report we don’t support extensions. >>>> (We still want to support range of monitor line size, whether we can treat interrupts as break-events for MWAIT and the supported C substates). >>>> I will create this patch for QEMU. >>> >>> Actually on second thought, I will just remain with the KVM patch (that Paolo was nice enough to already queue). >>> and not do this QEMU patch. This is because why will we want to prevent guest from specifying target C-State if he is exposed with MWAIT? >>> I don’t see a reason we should prevent that. Do you? >>> >> One reason it is a good idea to prevent the guest from entering deeper >> C-states (e.g. deeper than C2) is to allow preemption timer to be used again >> when guests are exposed with MWAIT (currently we can't do that). > > It is weird that we can observe intel_idle driver in the guest > executes mwait eax=0x20, and the corresponding pCPU enters C3 on HSW > server, however, we can't observe this on SKX/CLX server, it just > enters maximal C1. I assume you refer to the case where you pass the host mwait substates to the guests as is, right? Or are you zeroing/filtering out the mwait cpuid leaf EDX like my patch (attached in the previous message) suggests? Interestingly, hints set to 0x20 actually corresponds to C6 on HSW (based on intel_idle driver). IIUC From the SDM (see Vol 2B, "MWAIT for Power Management" in instruction set reference M-U) the hints register, doesn't necessarily guarantee the specified C-state depicted in the hints will be used. The manual makes it sound like it is tentative, and implementation-specific condition may either ignore it or enter a different one. It appears to be only guaranteed that it won't enter a C-{sub,}state deeper than the one depicted. > All the setups between HSW and SKX/CLX are the > same. Sean and Liran, any opinions? > > Regards, > Wanpeng Li >