On 12/06/13 13:03, Paolo Bonzini wrote: > Il 05/12/2013 19:29, Laszlo Ersek ha scritto: >> On 12/05/13 18:42, Paolo Bonzini wrote: >>> Il 05/12/2013 17:12, Laszlo Ersek ha scritto: >>>> Hi, >>>> >>>> I'm working on S3 suspend/resume in OVMF. The problem is that I'm getting an >>>> unexpected guest reboot for code (LRET) that works on physical hardware. I >>>> tried to trace the problem with ftrace, but I didn't get any mentions of >>>> em_ret_far(). (Maybe I was looking in the wrong place.) >>> >>> What does ftrace say anyway? >> >> (pls. see in the next msg I sent) > > Actually I meant the ftrace without any patches. > > Thanks to your binary I now reproduced the issue and it looks like the > 64-bit->16-bit switch works: Thank you for spending (apparently more than a little) time on this! > > qemu-system-x86-4081 [001] 62650.335040: kvm_exit: reason CR_ACCESS rip 0x3cf7ae45 info 0 0 > qemu-system-x86-4081 [001] 62650.335041: kvm_cr: cr_write 0 = 0x32 > qemu-system-x86-4081 [001] 62650.335046: kvm_entry: vcpu 0 > > This is the "mov %rax, %cr0". PE and PG are turned off. I'm surprised by this result. The instruction you refer to is below "_AsmTransferControl_al_0000" (in the original, unpatched code). I had earlier added an infinite loop right below that label (a different loop than my xxxx debug loop), and it was *never* reached in my test. That is, from the lret that I reported as problematic, to the instruction you refer to, the CPU would have had to cross (and finish) the infinite loop that I had added earlier. And that never happened in my test. I had added that loop at "_AsmTransferControl_al_0000" immediately precisely because I wanted to see if the label is reached and the problem is with something below that label, or with the first lret. I sent my email to the KVM list after I had isolated the problem to the first LRET: http://thread.gmane.org/gmane.comp.bios.tianocore.devel/5297/focus=5325 On 12/04/13 19:05, Laszlo Ersek wrote: > I tested if the (intended) target location of the LRET is reached, and > it is not. (It's easy to test by adding a small infinite loop, moving > it around, and seeing if the VM is spinning with or without producing > a bunch of output on the debug port.) It's *really* that > internally-targeted LRET that causes a reboot. [...] I have absolutely no clue why this code executes for you and doesn't for me :) What guest RAM size did you test with? > qemu-system-x86-4081 [001] 62650.335047: kvm_exit: reason MSR_READ rip 0x3cf7ae4e info 0 0 > qemu-system-x86-4081 [001] 62650.335048: kvm_msr: msr_read c0000080 = 0x100 > qemu-system-x86-4081 [001] 62650.335048: kvm_entry: vcpu 0 > qemu-system-x86-4081 [001] 62650.335048: kvm_exit: reason MSR_WRITE rip 0x3cf7ae53 info 0 0 > qemu-system-x86-4081 [001] 62650.335049: kvm_msr: msr_write c0000080 = 0x0 > qemu-system-x86-4081 [001] 62650.335050: kvm_entry: vcpu 0 > > LME is turned off. > > qemu-system-x86-4081 [001] 62650.335050: kvm_exit: reason CR_ACCESS rip 0x3cf7ae55 info 304 0 > qemu-system-x86-4081 [001] 62650.335050: kvm_cr: cr_write 4 = 0x640 > qemu-system-x86-4081 [001] 62650.335053: kvm_entry: vcpu 0 > > PAE is turned off. > > qemu-system-x86-4081 [001] 62650.335054: kvm_exit: reason CR_ACCESS rip 0x11e6 info 0 0 > qemu-system-x86-4081 [001] 62650.335054: kvm_cr: cr_write 0 = 0x33 > qemu-system-x86-4081 [001] 62650.335054: kvm_entry: vcpu 0 > > Here we're already in real mode. The weird RIP is explained by > the first few bytes after the FACS resume vector: >From this point on you were debugging the Linux wakeup code, in "arch/x86/realmode/rm/wakeup_asm.S". I think. > > 0x9a1d:0000: cli > 0x9a1d:0001: cld > 0x9a1d:0002: ljmp $9900,$11d7 ENTRY(wakeup_start) cli cld LJMPW_RM(3f) 3: /* Apparently some dimwit BIOS programmers don't know how to program a PM to RM transition, and we might end up here with junk in the data segment descriptor registers. The only way to repair that is to go into PM and fix it ourselves... */ [...] >From Linux kernel commit 4b4f7280. > The page tables are, ahem, crap: > > 000c000: 6750 fe01 0000 0000 0000 0000 0000 0000 gP.............. > 000c010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 000c020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 000c030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 000c040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 000c050: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 000c060: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 000c070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 000c080: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 000c090: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 000c0a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 000c0b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 000c0c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 000c0d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 000c0e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 000c0f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > > This is 0x9c000. Strikes any bell? We're wildly corrupting OS memory during OVMF S3 resume. That's a known problem and the next stage for me to figure out (with Jordan's help hopefully): http://thread.gmane.org/gmane.comp.bios.tianocore.devel/5297/focus=5321 http://thread.gmane.org/gmane.comp.bios.tianocore.devel/5297/focus=5325 So, your tracing reached / debugged code that I had never ever reached. And my report was precisely about not reaching it. Once we reach it, it's expected to blow up, but first I wanted to get there. Again, the 64-bit->16-bit switch (in the original, unpatched edk2/OVMF code) never worked for me. I think I did find the reason for that though, please see http://thread.gmane.org/gmane.comp.bios.tianocore.devel/5343/focus=5365 especially the last patch attached to it. The likely reason for the failure I was seeing is that the 16-bit code had been relocated to way above 1MB and could not be addressed with the 16-bit CS:IP notation at all. Thanks! Laszlo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html