On 10/16/15 05:05, Xiao Guangrong wrote: > > > On 10/16/2015 12:18 AM, Laszlo Ersek wrote: >> CC'ing Jordan and Chen Fan. >> >> On 10/15/15 09:10, Xiao Guangrong wrote: >>> >>> >>> On 10/15/2015 02:58 PM, Janusz wrote: >>>> W dniu 15.10.2015 o 08:41, Xiao Guangrong pisze: >>>>> >>>>> >>>>> On 10/15/2015 02:19 PM, Janusz wrote: >>>>>> W dniu 15.10.2015 o 06:19, Xiao Guangrong pisze: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Well, the bug may be not in KVM. When this bug happened, i saw OVMF >>>>>>> only checked 1 CPU out, there is the log from OVMF's debug input: >>>>>>> >>>>>>> Flushing GCD >>>>>>> Flushing GCD >>>>>>> Flushing GCD >>>>>>> Flushing GCD >>>>>>> Flushing GCD >>>>>>> Flushing GCD >>>>>>> Flushing GCD >>>>>>> Flushing GCD >>>>>>> Flushing GCD >>>>>>> Flushing GCDs >>>>>>> Detect CPU count: 1 >>>>>>> >>>>>>> So that the startup code has been freed however the APs are still >>>>>>> running, >>>>>>> i think that why we saw the vCPUs executed on unexpected address. >>>>>>> >>>>>>> After digging into OVMF's code, i noticed that BSP CPU waits for APs >>>>>>> for a fixed timer period, however, KVM recent changes require zap >>>>>>> all >>>>>>> mappings if CR0.CD is changed, that means the APs need more time to >>>>>>> startup. >>>>>>> >>>>>>> After following changes to OVMF, the bug is completely gone on my >>>>>>> side: >>>>>>> >>>>>>> --- a/UefiCpuPkg/CpuDxe/ApStartup.c >>>>>>> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c >>>>>>> @@ -454,7 +454,9 @@ StartApsStackless ( >>>>>>> // >>>>>>> // Wait 100 milliseconds for APs to arrive at the ApEntryPoint >>>>>>> routine >>>>>>> // >>>>>>> - MicroSecondDelay (100 * 1000); >>>>>>> + MicroSecondDelay (10 * 100 * 1000); >>>>>>> >>>>>>> return EFI_SUCCESS; >>>>>>> } >>>>>>> >>>>>>> Janusz, could you please check this instead? You can switch to your >>>>>>> previous kernel to do this test. >>>>>>> >>>>>>> >>>>>> Ok, now first time when I started VM I was able to start system >>>>>> successfully. When I turned it off and started it again, it >>>>>> restarted my >>>>>> vm at system boot couple of times. Sometimes I also get very high cpu >>>>>> usage for no reason. Also, I get less fps in GTA 5 than in kernel >>>>>> 4.1, I >>>>>> get something like 30-55, but on 4.1 I get all the time 60 fps. >>>>>> This is >>>>>> my new log: https://bpaste.net/show/61a122ad7fe5 >>>>>> >>>>> >>>>> Just confirm: the Qemu internal error did not appear any more, right? >>>> Yes, when I reverted your first patch, switched to -vga std from -vga >>>> none and didn't passthrough my GPU (case when I got this internal >>>> error), vm started without problem. I even didn't get any VM restarts >>>> like with passthrough >>>> >>> >>> Wow, it seems we have fixed the QEMU internal error now. :) >>> >>> Recurrently, Paolo has reverted some MTRR patches, was your test >>> based on these reverted patches? >>> >>> The GPU passthrough issue may be related to vfio (not sure), Alex, do >>> you have any idea? >>> >>> Laszlo, could you please check the root case is reasonable and fix it in >>> OVMF if it's right? >> >> The code that you have found is in edk2's EFI_MP_SERVICES_PROTOCOL >> implementation -- more closely, its initial CPU counter code --, from >> edk2 git commit 533263ee5a7f. It is not specific to OVMF -- it is >> generic edk2 code for Intel processors. (I'm CC'ing Jordan and Chen Fan >> because they authored the patch in question.) > > Okay, good to know it, i do not have much knowledge on edk2 and OVMF... :( > >> >> If VCPUs need more time to rendezvous than written in the code, on >> recent KVM, then I think we should introduce a new FixedPCD in >> UefiCpuPkg (practically: a compile time constant) for the timeout. Which >> is not hard to do. >> >> However, we'll need two things: >> - an idea about the concrete rendezvous timeout to set, from OvmfPkg >> >> - a *detailed* explanation / elaboration on your words: >> >> "KVM recent changes require zap all mappings if CR0.CD is changed, >> that means the APs need more time to startup" >> >> Preferably with references to Linux kernel commits and the Intel SDM, >> so that n00bs like me can get a fleeting idea. Do you mean that with >> caching disabled, the APs execute their rendezvous code (from memory) >> more slowly? > > Kernel commit b18d5431acc causes the vCPUs need more time to startup > as: > - it zaps all the mappings for the guest memory in EPT or shadow page > table, it requires VM-exits to rebuild the mappings for all memory > access. > > - if there is device passthrough-ed in guest and IOMMU lacks snooping > control feature, the memory will become UC after CR0.CD is set to 1. > > And a generic factor is, if the guest has more vCPUs then more time is > needed. That why the bug is hardly triggered on small vCPUs guest. I > guess we need a self-adapting way to handle the case... Thanks, this should be enough for composing a commit message. > >> >>> BTW, OVMF handles #UD with no trace - nothing is killed, and no call >>> trace >>> in the debug input... >> >> There *is* a trace (of any unexpected exception -- at least for the >> BSP), but unfortunately its location is not intuitive. >> >> The exception handler that is built into OVMF >> ("UefiCpuPkg/Library/CpuExceptionHandlerLib") is again generic edk2 >> code, and it prints the trace directly to the serial port, regardless of >> the fact that OVMF's DebugLib instance logs explicit DEBUGs to the QEMU >> debug port. (The latter can be directed to the serial port as well, if >> you build OVMF with -D DEBUG_ON_SERIAL_PORT, but this is not relevant >> here.) >> >> If you reproduce the issue while looking at the (virtual) serial port of >> the guest, I trust you will get a register dump. > > Er... it seems no dump in serial output, i attached it in this mail. The > system > continues to run with 1 CPU enabled...... Actually, the guest is in a reboot loop, it just may not be obvious from the log. Whenever you see SecCoreStartupWithStack(0xFFFCC000, 0x818000) that means the guest has rebooted. The fault handler that I described becomes active when a fault gets injected visibily to the guest -- or happens within the guest entirely -- for example, a null pointer dereference, and the fault handler can actually handle it. I guess a triple fault occurs or some such. Thanks Laszlo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html