Re: KVM-unit-tests on AMD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Oct 9, 2019, at 11:53 AM, Cathy Avery <cavery@xxxxxxxxxx> wrote:
> 
> On 10/9/19 1:32 PM, Nadav Amit wrote:
>> On Oct 9, 2019, at 4:39 AM, Cathy Avery <cavery@xxxxxxxxxx> wrote:
>>> On 10/8/19 4:02 PM, Nadav Amit wrote:
>>>>> On Oct 8, 2019, at 9:30 AM, Nadav Amit <nadav.amit@xxxxxxxxx> wrote:
>>>>> 
>>>>>> On Oct 8, 2019, at 5:19 AM, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
>>>>>> 
>>>>>> Nadav Amit <nadav.amit@xxxxxxxxx> writes:
>>>>>> 
>>>>>>> Is kvm-unit-test supposed to pass on AMD machines or AMD VMs?.
>>>>>> It is supposed to but it doesn't :-) Actually, not only kvm-unit-tests
>>>>>> but the whole SVM would appreciate some love ...
>>>>>> 
>>>>>>> Clearly, I ask since they do not pass on AMD on bare-metal.
>>>>>> On my AMD EPYC 7401P 24-Core Processor bare metal I get the following
>>>>>> failures:
>>>>>> 
>>>>>> FAIL vmware_backdoors (11 tests, 8 unexpected failures)
>>>>>> 
>>>>>> (Why can't we just check
>>>>>> /sys/module/kvm/parameters/enable_vmware_backdoor btw???)
>>>>>> 
>>>>>> FAIL svm (15 tests, 1 unexpected failures)
>>>>>> 
>>>>>> There is a patch for that:
>>>>>> 
>>>>>> https://lore.kernel.org/kvm/d3eeb3b5-13d7-34d2-4ce0-fdd534f2bcc3@xxxxxxxxxx/T/#t
>>>>>> 
>>>>>> Inside a VM on this host I see the following:
>>>>>> 
>>>>>> FAIL apic-split (timeout; duration=90s)
>>>>>> FAIL apic (timeout; duration=30)
>>>>>> 
>>>>>> (I manually inreased the timeout but it didn't help - this is worrisome,
>>>>>> most likely this is a hang)
>>>>>> 
>>>>>> FAIL vmware_backdoors (11 tests, 8 unexpected failures)
>>>>>> 
>>>>>> - same as on bare metal
>>>>>> 
>>>>>> FAIL port80 (timeout; duration=90s)
>>>>>> 
>>>>>> - hang again?
>>>>>> 
>>>>>> FAIL svm (timeout; duration=90s)
>>>>>> 
>>>>>> - most likely a hang but this is 3-level nesting so oh well..
>>>>>> 
>>>>>> FAIL kvmclock_test
>>>>>> 
>>>>>> - bad but maybe something is wrong with TSC on the host? Need to
>>>>>> investigate ...
>>>>>> 
>>>>>> FAIL hyperv_clock
>>>>>> 
>>>>>> - this is expected as it doesn't work when the clocksource is not TSC
>>>>>> (e.g. kvm-clock)
>>>>>> 
>>>>>> Are you seeing different failures?
>>>>> Thanks for your quick response.
>>>>> 
>>>>> I only ran the “apic” tests so far and I got the following failures:
>>>>> 
>>>>> FAIL: correct xapic id after reset
>>>>> …
>>>>> x2apic not detected
>>>>> FAIL: enable unsupported x2apic
>>>>> FAIL: apicbase: relocate apic
>>>>> 
>>>>> The test gets stuck after “apicbase: reserved low bits”.
>>>>> 
>>>>> Well, I understand it is not a bare-metal thing.
>>>> I ran the SVM test, and on bare-metal it does not pass.
>>>> 
>>>> I don’t have the AMD machine for long enough to fix the issues, but for the
>>>> record, here are test failures and crashes I encountered while running the
>>>> tests on bare-metal.
>>>> 
>>>> Failures:
>>>> - cr3 read intercept emulate
>>>> - npt_nx
>>>> - npt_rsvd
>>>> - npt_rsvd_pfwalk
>>>> - npt_rw_pfwalk
>>>> - npt_rw_l1mmio
>>>> 
>>>> Crashes:
>>>> - test_dr_intercept - Access to DR4 causes #UD
>>>> - tsc_adjust_prepare - MSR access causes #GP
>>>> 
>>> Interesting. I just ran the latest on bare-metal and it did pass.
>>> 
>>> enabling apic
>>> enabling apic
>>> paging enabled
>>> cr0 = 80010011
>>> cr3 = 62a000
>>> cr4 = 20
>>> NPT detected - running all tests with NPT enabled
>>> PASS: null
>>> PASS: vmrun
>>> PASS: ioio
>>> PASS: vmrun intercept check
>>> PASS: cr3 read intercept
>>> PASS: cr3 read nointercept
>>> PASS: cr3 read intercept emulate
>>> PASS: dr intercept check
>>> PASS: next_rip
>>> PASS: msr intercept check
>>> PASS: mode_switch
>>> PASS: asid_zero
>>> PASS: sel_cr0_bug
>>> PASS: npt_nx
>>> PASS: npt_us
>>> PASS: npt_rsvd
>>> PASS: npt_rw
>>> PASS: npt_rsvd_pfwalk
>>> PASS: npt_rw_pfwalk
>>> PASS: npt_l1mmio
>>> PASS: npt_rw_l1mmio
>>> PASS: tsc_adjust
>>>     Latency VMRUN : max: 49300 min: 3160 avg: 3228
>>>     Latency VMEXIT: max: 607780 min: 2940 avg: 2999
>>> PASS: latency_run_exit
>>>     Latency VMLOAD: max: 29720 min: 300 avg: 306
>>>     Latency VMSAVE: max: 31660 min: 280 avg: 282
>>>     Latency STGI:   max: 18860 min: 40 avg: 54
>>>     Latency CLGI:   max: 16060 min: 40 avg: 53
>>> PASS: latency_svm_insn
>>> SUMMARY: 24 tests
>> Just to make sure, you actually ran it on bare-metal? Without KVM?
>> 
> The tests were run on a Fedora 29 server with recent upstream kernel, qemu, and yes with KVM.

So I regard a different setup - No KVM, running the tests directly
on bare-metal. Running tests in this manner exposes bugs in the
tests themselves, since they sometime make the wrong assumption on
how hardware behaves.

And of course bugs in the tests sometimes indicate bugs in KVM as
well. For instance:

https://lore.kernel.org/kvm/20190919125211.18152-6-liran.alon@xxxxxxxxxx/T/#ebb7e52ae77cccbc8b2455466b34c2e28a9b4c56d
https://patchwork.kernel.org/patch/10951713/





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux