RE: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled

"Kinney, Michael D" <michael.d.kinney@xxxxxxxxx> · Thu, 15 Oct 2015 16:53:25 +0000

Laszlo,

There is already a PCD for this timeout that is used by CpuMpPei.

	gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds

I noticed that CpuDxe is using a hard coded AP timeout.  I think we should just use this same PCD for both the PEI and DXE CPU module and then set it for OVMF to the compatible value.

Mike

>-----Original Message-----
>From: edk2-devel [mailto:edk2-devel-bounces@xxxxxxxxxxxx] On Behalf Of
>Laszlo Ersek
>Sent: Thursday, October 15, 2015 9:19 AM
>To: Xiao Guangrong
>Cc: kvm@xxxxxxxxxxxxxxx; Justen, Jordan L; edk2-devel@xxxxxxxxxxx; Alex
>Williamson; Chen Fan; Paolo Bonzini; Wanpeng Li
>Subject: Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is
>completely disabled
>
>CC'ing Jordan and Chen Fan.
>
>On 10/15/15 09:10, Xiao Guangrong wrote:
>>
>>
>> On 10/15/2015 02:58 PM, Janusz wrote:
>>> W dniu 15.10.2015 o 08:41, Xiao Guangrong pisze:
>>>>
>>>>
>>>> On 10/15/2015 02:19 PM, Janusz wrote:
>>>>> W dniu 15.10.2015 o 06:19, Xiao Guangrong pisze:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Well, the bug may be not in KVM. When this bug happened, i saw
>OVMF
>>>>>> only checked 1 CPU out, there is the log from OVMF's debug input:
>>>>>>
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCDs
>>>>>> Detect CPU count: 1
>>>>>>
>>>>>> So that the startup code has been freed however the APs are still
>>>>>> running,
>>>>>> i think that why we saw the vCPUs executed on unexpected address.
>>>>>>
>>>>>> After digging into OVMF's code, i noticed that BSP CPU waits for APs
>>>>>> for a fixed timer period, however, KVM recent changes require zap all
>>>>>> mappings if CR0.CD is changed, that means the APs need more time to
>>>>>> startup.
>>>>>>
>>>>>> After following changes to OVMF, the bug is completely gone on my
>>>>>> side:
>>>>>>
>>>>>> --- a/UefiCpuPkg/CpuDxe/ApStartup.c
>>>>>> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c
>>>>>> @@ -454,7 +454,9 @@ StartApsStackless (
>>>>>>      //
>>>>>>      // Wait 100 milliseconds for APs to arrive at the ApEntryPoint
>>>>>> routine
>>>>>>      //
>>>>>> -  MicroSecondDelay (100 * 1000);
>>>>>> +  MicroSecondDelay (10 * 100 * 1000);
>>>>>>
>>>>>>      return EFI_SUCCESS;
>>>>>>    }
>>>>>>
>>>>>> Janusz, could you please check this instead? You can switch to your
>>>>>> previous kernel to do this test.
>>>>>>
>>>>>>
>>>>> Ok, now first time when I started VM I was able to start system
>>>>> successfully. When I turned it off and started it again, it
>>>>> restarted my
>>>>> vm at system boot couple of times. Sometimes I also get very high cpu
>>>>> usage for no reason. Also, I get less fps in GTA 5 than in kernel
>>>>> 4.1, I
>>>>> get something like 30-55, but on 4.1 I get all the time 60 fps. This is
>>>>> my new log: https://bpaste.net/show/61a122ad7fe5
>>>>>
>>>>
>>>> Just confirm: the Qemu internal error did not appear any more, right?
>>> Yes, when I reverted your first patch, switched to -vga std from -vga
>>> none and didn't passthrough my GPU (case when I got this internal
>>> error), vm started without problem. I even didn't get any VM restarts
>>> like with passthrough
>>>
>>
>> Wow, it seems we have fixed the QEMU internal error now. :)
>>
>> Recurrently, Paolo has reverted some MTRR patches, was your test
>> based on these reverted patches?
>>
>> The GPU passthrough issue may be related to vfio (not sure), Alex, do
>> you have any idea?
>>
>> Laszlo, could you please check the root case is reasonable and fix it in
>> OVMF if it's right?
>
>The code that you have found is in edk2's EFI_MP_SERVICES_PROTOCOL
>implementation -- more closely, its initial CPU counter code --, from
>edk2 git commit 533263ee5a7f. It is not specific to OVMF -- it is
>generic edk2 code for Intel processors. (I'm CC'ing Jordan and Chen Fan
>because they authored the patch in question.)
>
>If VCPUs need more time to rendezvous than written in the code, on
>recent KVM, then I think we should introduce a new FixedPCD in
>UefiCpuPkg (practically: a compile time constant) for the timeout. Which
>is not hard to do.
>
>However, we'll need two things:
>- an idea about the concrete rendezvous timeout to set, from OvmfPkg
>
>- a *detailed* explanation / elaboration on your words:
>
>  "KVM recent changes require zap all mappings if CR0.CD is changed,
>  that means the APs need more time to startup"
>
>  Preferably with references to Linux kernel commits and the Intel SDM,
>  so that n00bs like me can get a fleeting idea. Do you mean that with
>  caching disabled, the APs execute their rendezvous code (from memory)
>  more slowly?
>
>> BTW, OVMF handles #UD with no trace - nothing is killed, and no call trace
>> in the debug input...
>
>There *is* a trace (of any unexpected exception -- at least for the
>BSP), but unfortunately its location is not intuitive.
>
>The exception handler that is built into OVMF
>("UefiCpuPkg/Library/CpuExceptionHandlerLib") is again generic edk2
>code, and it prints the trace directly to the serial port, regardless of
>the fact that OVMF's DebugLib instance logs explicit DEBUGs to the QEMU
>debug port. (The latter can be directed to the serial port as well, if
>you build OVMF with -D DEBUG_ON_SERIAL_PORT, but this is not relevant
>here.)
>
>If you reproduce the issue while looking at the (virtual) serial port of
>the guest, I trust you will get a register dump.
>
>Thanks!
>Laszlo
>_______________________________________________
>edk2-devel mailing list
>edk2-devel@xxxxxxxxxxxx
>https://lists.01.org/mailman/listinfo/edk2-devel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html