> On Jan 28, 2020, at 10:43 AM, Jim Mattson <jmattson@xxxxxxxxxx> wrote: > > On Tue, Jan 28, 2020 at 10:42 AM Nadav Amit <nadav.amit@xxxxxxxxx> wrote: >>> On Jan 28, 2020, at 10:33 AM, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: >>> >>> On Tue, Jan 28, 2020 at 09:59:45AM -0800, Jim Mattson wrote: >>>> On Mon, Jan 27, 2020 at 12:56 PM Sean Christopherson >>>> <sean.j.christopherson@xxxxxxxxx> wrote: >>>>> On Mon, Jan 27, 2020 at 11:24:31AM -0800, Jim Mattson wrote: >>>>>> On Sun, Jan 26, 2020 at 8:36 PM Nadav Amit <nadav.amit@xxxxxxxxx> wrote: >>>>>>>> On Jan 26, 2020, at 2:06 PM, Jim Mattson <jmattson@xxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> If I had to guess, you probably have SMM malware on your host. Remove >>>>>>>> the malware, and the test should pass. >>>>>>> >>>>>>> Well, malware will always be an option, but I doubt this is the case. >>>>>> >>>>>> Was my innuendo too subtle? I consider any code executing in SMM to be malware. >>>>> >>>>> SMI complications seem unlikely. The straw that broke the camel's back >>>>> was a 1152 cyle delta, presumably the other failing runs had similar deltas. >>>>> I've never benchmarked SMI+RSM, but I highly doubt it comes anywhere close >>>>> to VM-Enter/VM-Exit's super optimized ~400 cycle round trip. E.g. I >>>>> wouldn't be surprised if just SMI+RSM is over 1500 cycles. >>>> >>>> Good point. What generation of hardware are you running on, Nadav? >>> >>> Skylake. >> >> Indeed. Thanks for answering on my behalf ;-) >> >>>>>>> Interestingly, in the last few times the failure did not reproduce. Yet, >>>>>>> thinking about it made me concerned about MTRRs configuration, and that >>>>>>> perhaps performance is affected by memory marked as UC after boot, since >>>>>>> kvm-unit-test does not reset MTRRs. >>>>>>> >>>>>>> Reading the variable range MTRRs, I do see some ranges marked as UC (most of >>>>>>> the range 2GB-4GB, if I read the MTRRs correctly): >>>>>>> >>>>>>> MSR 0x200 = 0x80000000 >>>>>>> MSR 0x201 = 0x3fff80000800 >>>>>>> MSR 0x202 = 0xff000005 >>>>>>> MSR 0x203 = 0x3fffff000800 >>>>>>> MSR 0x204 = 0x38000000000 >>>>>>> MSR 0x205 = 0x3f8000000800 >>>>>>> >>>>>>> Do you think we should set the MTRRs somehow in KVM-unit-tests? If yes, can >>>>>>> you suggest a reasonable configuration? >>>>>> >>>>>> I would expect MTRR issues to result in repeatable failures. For >>>>>> instance, if your VMCS ended up in UC memory, that might slow things >>>>>> down quite a bit. But, I would expect the VMCS to end up at the same >>>>>> address each time the test is run. >>>>> >>>>> Agreed on the repeatable failures part, but putting the VMCS in UC memory >>>>> shouldn't affect this type of test. The CPU's internal VMCS cache isn't >>>>> coherent, and IIRC isn't disabled if the MTRRs for the VMCS happen to be >>>>> UC. >>>> >>>> But the internal VMCS cache only contains selected fields, doesn't it? >>>> Uncached fields would have to be written to memory on VM-exit. Or are >>>> all of the mutable fields in the internal VMCS cache? >>> >>> Hmm. I can neither confirm nor deny? The official Intel response to this >>> would be "it's microarchitectural". I'll put it this way: it's in Intel's >>> best interest to minimize the latency of VMREAD, VMWRITE, VM-Enter and >>> VM-Exit. >> >> I will run some more experiments and get back to you. It is a shame that >> every experiment requires a (real) boot… > > Yes! It's not just a shame; it's a serious usability issue. The easy way to run these experiments would have been to use an Intel CRB (Customer Reference Board), which boots relatively fast, with an ITP (In-Target Probe). This would have simplified testing and debugging considerably. Perhaps some sort of PXE-boot would also be beneficial. Unfortunately, I do not have the hardware and it does not seem other care that much so far. Despite the usability issues, running the tests on bare-metal already revealed several bugs in KVM (and one SDM issue), which were not apparent since the tests were wrong.