On 12/08/13 18:43, Laszlo Ersek wrote: > On 12/06/13 13:03, Paolo Bonzini wrote: >> Thanks to your binary I now reproduced the issue and it looks like the >> 64-bit->16-bit switch works: > > Thank you for spending (apparently more than a little) time on this! > >> >> qemu-system-x86-4081 [001] 62650.335040: kvm_exit: reason CR_ACCESS rip 0x3cf7ae45 info 0 0 >> qemu-system-x86-4081 [001] 62650.335041: kvm_cr: cr_write 0 = 0x32 >> qemu-system-x86-4081 [001] 62650.335046: kvm_entry: vcpu 0 >> >> This is the "mov %rax, %cr0". PE and PG are turned off. > > I'm surprised by this result. The instruction you refer to is below > "_AsmTransferControl_al_0000" (in the original, unpatched code). > > I had earlier added an infinite loop right below that label (a different > loop than my xxxx debug loop), and it was *never* reached in my test. > That is, from the lret that I reported as problematic, to the > instruction you refer to, the CPU would have had to cross (and finish) > the infinite loop that I had added earlier. And that never happened in > my test. > > I had added that loop at "_AsmTransferControl_al_0000" immediately > precisely because I wanted to see if the label is reached and the > problem is with something below that label, or with the first lret. I > sent my email to the KVM list after I had isolated the problem to the > first LRET: > > http://thread.gmane.org/gmane.comp.bios.tianocore.devel/5297/focus=5325 > > On 12/04/13 19:05, Laszlo Ersek wrote: >> I tested if the (intended) target location of the LRET is reached, and >> it is not. (It's easy to test by adding a small infinite loop, moving >> it around, and seeing if the VM is spinning with or without producing >> a bunch of output on the debug port.) It's *really* that >> internally-targeted LRET that causes a reboot. [...] First, you're right and I'm wrong. I can see the triple fault in my kvm trace as well. Second... This is incredible. Guess what the following patch does. ASM_GLOBAL ASM_PFX(AsmTransferControl) ASM_PFX(AsmTransferControl): # rcx S3WakingVector :DWORD # rdx AcpiLowMemoryBase :DWORD lea _AsmTransferControl_al_0000(%rip), %eax movq $0x2800000000, %r8 orq %r8, %rax pushq %rax shrd $20, %ecx, %ebx andl $0x0f, %ecx movw %cx, %bx movl %ebx, jmp_addr(%rip) lret _AsmTransferControl_al_0000: .byte 0x0b8, 0x30, 0 # mov ax, 30h as selector movl %eax, %ds movl %eax, %es movl %eax, %fs movl %eax, %gs movl %eax, %ss +.code16 + movw $0x402, %dx + movb $0x58, %al +qqq: + outb %al, %dx + jmp qqq + movb $0x59, %al + outb %al, %dx +.code64 movq %cr0, %rax movq %cr4, %rbx The infinite loop that I used to "pin down" the first instruction that broke around lret -- it was useless. The above patch should produce an infinite string of "X" characters on the qemu debug port. "After" the inifnite string, it should produce a "Y" character. In practice, I'm seeing *one* "X" character, and then a triple fault. The "jmp qqq" instruction *itself* causes a triple fault (a guest reboot), masking the problem that I was looking for. When I had such an empty infinite loop (at that time, still without the outb) right before the lret, it worked. As soon as I moved it below _AsmTransferControl_al_0000 (ie. under the target of the lret), the jmp *itself* caused a triple fault (guest reboot), causing me to think that the *lret* caused the reboot. Because, the tight jmp loop would "always" work, and if I'm seeing a reboot loop instead of tight spinning right after the lret, that means that the lret caused the reboot, right? Right? This is the binary: 0000000000000037 <qqq>: 37: ee out %al,(%dx) 38: eb fd jmp 37 <qqq> Opcode Instruction Op/ 64-Bit Compat/ Description En Mode Leg Mode EB cb JMP rel8 A Valid Valid Jump short, RIP = RIP + 8-bit displacement sign extended to 64-bits My belief in the sanity of x86 has been shattered to its core. I don't feel stupid. I feel cheated. Thank you for your help. Laszlo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html