On Tue, Jan 28, 2020 at 10:42 AM Nadav Amit <nadav.amit@xxxxxxxxx> wrote: > > > On Jan 28, 2020, at 10:33 AM, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > > > On Tue, Jan 28, 2020 at 09:59:45AM -0800, Jim Mattson wrote: > >> On Mon, Jan 27, 2020 at 12:56 PM Sean Christopherson > >> <sean.j.christopherson@xxxxxxxxx> wrote: > >>> On Mon, Jan 27, 2020 at 11:24:31AM -0800, Jim Mattson wrote: > >>>> On Sun, Jan 26, 2020 at 8:36 PM Nadav Amit <nadav.amit@xxxxxxxxx> wrote: > >>>>>> On Jan 26, 2020, at 2:06 PM, Jim Mattson <jmattson@xxxxxxxxxx> wrote: > >>>>>> > >>>>>> If I had to guess, you probably have SMM malware on your host. Remove > >>>>>> the malware, and the test should pass. > >>>>> > >>>>> Well, malware will always be an option, but I doubt this is the case. > >>>> > >>>> Was my innuendo too subtle? I consider any code executing in SMM to be malware. > >>> > >>> SMI complications seem unlikely. The straw that broke the camel's back > >>> was a 1152 cyle delta, presumably the other failing runs had similar deltas. > >>> I've never benchmarked SMI+RSM, but I highly doubt it comes anywhere close > >>> to VM-Enter/VM-Exit's super optimized ~400 cycle round trip. E.g. I > >>> wouldn't be surprised if just SMI+RSM is over 1500 cycles. > >> > >> Good point. What generation of hardware are you running on, Nadav? > > > > Skylake. > > Indeed. Thanks for answering on my behalf ;-) > > > > >>>>> Interestingly, in the last few times the failure did not reproduce. Yet, > >>>>> thinking about it made me concerned about MTRRs configuration, and that > >>>>> perhaps performance is affected by memory marked as UC after boot, since > >>>>> kvm-unit-test does not reset MTRRs. > >>>>> > >>>>> Reading the variable range MTRRs, I do see some ranges marked as UC (most of > >>>>> the range 2GB-4GB, if I read the MTRRs correctly): > >>>>> > >>>>> MSR 0x200 = 0x80000000 > >>>>> MSR 0x201 = 0x3fff80000800 > >>>>> MSR 0x202 = 0xff000005 > >>>>> MSR 0x203 = 0x3fffff000800 > >>>>> MSR 0x204 = 0x38000000000 > >>>>> MSR 0x205 = 0x3f8000000800 > >>>>> > >>>>> Do you think we should set the MTRRs somehow in KVM-unit-tests? If yes, can > >>>>> you suggest a reasonable configuration? > >>>> > >>>> I would expect MTRR issues to result in repeatable failures. For > >>>> instance, if your VMCS ended up in UC memory, that might slow things > >>>> down quite a bit. But, I would expect the VMCS to end up at the same > >>>> address each time the test is run. > >>> > >>> Agreed on the repeatable failures part, but putting the VMCS in UC memory > >>> shouldn't affect this type of test. The CPU's internal VMCS cache isn't > >>> coherent, and IIRC isn't disabled if the MTRRs for the VMCS happen to be > >>> UC. > >> > >> But the internal VMCS cache only contains selected fields, doesn't it? > >> Uncached fields would have to be written to memory on VM-exit. Or are > >> all of the mutable fields in the internal VMCS cache? > > > > Hmm. I can neither confirm nor deny? The official Intel response to this > > would be "it's microarchitectural". I'll put it this way: it's in Intel's > > best interest to minimize the latency of VMREAD, VMWRITE, VM-Enter and > > VM-Exit. > > I will run some more experiments and get back to you. It is a shame that > every experiment requires a (real) boot… Yes! It's not just a shame; it's a serious usability issue.