Laszlo, There is already a PCD for this timeout that is used by CpuMpPei. gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds I noticed that CpuDxe is using a hard coded AP timeout. I think we should just use this same PCD for both the PEI and DXE CPU module and then set it for OVMF to the compatible value. Mike >-----Original Message----- >From: edk2-devel [mailto:edk2-devel-bounces@xxxxxxxxxxxx] On Behalf Of >Laszlo Ersek >Sent: Thursday, October 15, 2015 9:19 AM >To: Xiao Guangrong >Cc: kvm@xxxxxxxxxxxxxxx; Justen, Jordan L; edk2-devel@xxxxxxxxxxx; Alex >Williamson; Chen Fan; Paolo Bonzini; Wanpeng Li >Subject: Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is >completely disabled > >CC'ing Jordan and Chen Fan. > >On 10/15/15 09:10, Xiao Guangrong wrote: >> >> >> On 10/15/2015 02:58 PM, Janusz wrote: >>> W dniu 15.10.2015 o 08:41, Xiao Guangrong pisze: >>>> >>>> >>>> On 10/15/2015 02:19 PM, Janusz wrote: >>>>> W dniu 15.10.2015 o 06:19, Xiao Guangrong pisze: >>>>>> >>>>>> >>>>>> >>>>>> Well, the bug may be not in KVM. When this bug happened, i saw >OVMF >>>>>> only checked 1 CPU out, there is the log from OVMF's debug input: >>>>>> >>>>>> Flushing GCD >>>>>> Flushing GCD >>>>>> Flushing GCD >>>>>> Flushing GCD >>>>>> Flushing GCD >>>>>> Flushing GCD >>>>>> Flushing GCD >>>>>> Flushing GCD >>>>>> Flushing GCD >>>>>> Flushing GCDs >>>>>> Detect CPU count: 1 >>>>>> >>>>>> So that the startup code has been freed however the APs are still >>>>>> running, >>>>>> i think that why we saw the vCPUs executed on unexpected address. >>>>>> >>>>>> After digging into OVMF's code, i noticed that BSP CPU waits for APs >>>>>> for a fixed timer period, however, KVM recent changes require zap all >>>>>> mappings if CR0.CD is changed, that means the APs need more time to >>>>>> startup. >>>>>> >>>>>> After following changes to OVMF, the bug is completely gone on my >>>>>> side: >>>>>> >>>>>> --- a/UefiCpuPkg/CpuDxe/ApStartup.c >>>>>> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c >>>>>> @@ -454,7 +454,9 @@ StartApsStackless ( >>>>>> // >>>>>> // Wait 100 milliseconds for APs to arrive at the ApEntryPoint >>>>>> routine >>>>>> // >>>>>> - MicroSecondDelay (100 * 1000); >>>>>> + MicroSecondDelay (10 * 100 * 1000); >>>>>> >>>>>> return EFI_SUCCESS; >>>>>> } >>>>>> >>>>>> Janusz, could you please check this instead? You can switch to your >>>>>> previous kernel to do this test. >>>>>> >>>>>> >>>>> Ok, now first time when I started VM I was able to start system >>>>> successfully. When I turned it off and started it again, it >>>>> restarted my >>>>> vm at system boot couple of times. Sometimes I also get very high cpu >>>>> usage for no reason. Also, I get less fps in GTA 5 than in kernel >>>>> 4.1, I >>>>> get something like 30-55, but on 4.1 I get all the time 60 fps. This is >>>>> my new log: https://bpaste.net/show/61a122ad7fe5 >>>>> >>>> >>>> Just confirm: the Qemu internal error did not appear any more, right? >>> Yes, when I reverted your first patch, switched to -vga std from -vga >>> none and didn't passthrough my GPU (case when I got this internal >>> error), vm started without problem. I even didn't get any VM restarts >>> like with passthrough >>> >> >> Wow, it seems we have fixed the QEMU internal error now. :) >> >> Recurrently, Paolo has reverted some MTRR patches, was your test >> based on these reverted patches? >> >> The GPU passthrough issue may be related to vfio (not sure), Alex, do >> you have any idea? >> >> Laszlo, could you please check the root case is reasonable and fix it in >> OVMF if it's right? > >The code that you have found is in edk2's EFI_MP_SERVICES_PROTOCOL >implementation -- more closely, its initial CPU counter code --, from >edk2 git commit 533263ee5a7f. It is not specific to OVMF -- it is >generic edk2 code for Intel processors. (I'm CC'ing Jordan and Chen Fan >because they authored the patch in question.) > >If VCPUs need more time to rendezvous than written in the code, on >recent KVM, then I think we should introduce a new FixedPCD in >UefiCpuPkg (practically: a compile time constant) for the timeout. Which >is not hard to do. > >However, we'll need two things: >- an idea about the concrete rendezvous timeout to set, from OvmfPkg > >- a *detailed* explanation / elaboration on your words: > > "KVM recent changes require zap all mappings if CR0.CD is changed, > that means the APs need more time to startup" > > Preferably with references to Linux kernel commits and the Intel SDM, > so that n00bs like me can get a fleeting idea. Do you mean that with > caching disabled, the APs execute their rendezvous code (from memory) > more slowly? > >> BTW, OVMF handles #UD with no trace - nothing is killed, and no call trace >> in the debug input... > >There *is* a trace (of any unexpected exception -- at least for the >BSP), but unfortunately its location is not intuitive. > >The exception handler that is built into OVMF >("UefiCpuPkg/Library/CpuExceptionHandlerLib") is again generic edk2 >code, and it prints the trace directly to the serial port, regardless of >the fact that OVMF's DebugLib instance logs explicit DEBUGs to the QEMU >debug port. (The latter can be directed to the serial port as well, if >you build OVMF with -D DEBUG_ON_SERIAL_PORT, but this is not relevant >here.) > >If you reproduce the issue while looking at the (virtual) serial port of >the guest, I trust you will get a register dump. > >Thanks! >Laszlo >_______________________________________________ >edk2-devel mailing list >edk2-devel@xxxxxxxxxxxx >https://lists.01.org/mailman/listinfo/edk2-devel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html