Re: [edk2] apparent KVM problem with LRET in TianoCore S3 resume trampoline

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/08/13 18:43, Laszlo Ersek wrote:
> On 12/06/13 13:03, Paolo Bonzini wrote:

>> Thanks to your binary I now reproduced the issue and it looks like the
>> 64-bit->16-bit switch works:
> 
> Thank you for spending (apparently more than a little) time on this!
> 
>>
>>  qemu-system-x86-4081  [001] 62650.335040: kvm_exit:             reason CR_ACCESS rip 0x3cf7ae45 info 0 0
>>  qemu-system-x86-4081  [001] 62650.335041: kvm_cr:               cr_write 0 = 0x32
>>  qemu-system-x86-4081  [001] 62650.335046: kvm_entry:            vcpu 0
>>
>> 	This is the "mov %rax, %cr0". PE and PG are turned off.
> 
> I'm surprised by this result. The instruction you refer to is below
> "_AsmTransferControl_al_0000" (in the original, unpatched code).
> 
> I had earlier added an infinite loop right below that label (a different
> loop than my xxxx debug loop), and it was *never* reached in my test.
> That is, from the lret that I reported as problematic, to the
> instruction you refer to, the CPU would have had to cross (and finish)
> the infinite loop that I had added earlier. And that never happened in
> my test.
> 
> I had added that loop at "_AsmTransferControl_al_0000" immediately
> precisely because I wanted to see if the label is reached and the
> problem is with something below that label, or with the first lret. I
> sent my email to the KVM list after I had isolated the problem to the
> first LRET:
> 
> http://thread.gmane.org/gmane.comp.bios.tianocore.devel/5297/focus=5325
> 
> On 12/04/13 19:05, Laszlo Ersek wrote:
>> I tested if the (intended) target location of the LRET is reached, and
>> it is not. (It's easy to test by adding a small infinite loop, moving
>> it around, and seeing if the VM is spinning with or without producing
>> a bunch of output on the debug port.) It's *really* that
>> internally-targeted LRET that causes a reboot. [...]

First, you're right and I'm wrong. I can see the triple fault in my kvm
trace as well.

Second... This is incredible. Guess what the following patch does.

 ASM_GLOBAL ASM_PFX(AsmTransferControl)
 ASM_PFX(AsmTransferControl):
     # rcx S3WakingVector    :DWORD
     # rdx AcpiLowMemoryBase :DWORD
     lea   _AsmTransferControl_al_0000(%rip), %eax
     movq  $0x2800000000, %r8
     orq   %r8, %rax
     pushq %rax
     shrd  $20, %ecx, %ebx
     andl  $0x0f, %ecx
     movw  %cx, %bx
     movl  %ebx, jmp_addr(%rip)
     lret
 _AsmTransferControl_al_0000:
     .byte    0x0b8, 0x30, 0      # mov ax, 30h as selector
     movl  %eax, %ds
     movl  %eax, %es
     movl  %eax, %fs
     movl  %eax, %gs
     movl  %eax, %ss
+.code16
+    movw $0x402, %dx
+    movb $0x58, %al
+qqq:
+    outb %al, %dx
+    jmp qqq
+    movb $0x59, %al
+    outb %al, %dx
+.code64
     movq  %cr0, %rax
     movq  %cr4, %rbx

The infinite loop that I used to "pin down" the first instruction that
broke around lret -- it was useless.

The above patch should produce an infinite string of "X" characters on
the qemu debug port. "After" the inifnite string, it should produce a
"Y" character.

In practice, I'm seeing *one* "X" character, and then a triple fault.

The "jmp qqq" instruction *itself* causes a triple fault (a guest
reboot), masking the problem that I was looking for.

When I had such an empty infinite loop (at that time, still without the
outb) right before the lret, it worked. As soon as I moved it below
_AsmTransferControl_al_0000 (ie. under the target of the lret), the jmp
*itself* caused a triple fault (guest reboot), causing me to think that
the *lret* caused the reboot. Because, the tight jmp loop would "always"
work, and if I'm seeing a reboot loop instead of tight spinning right
after the lret, that means that the lret caused the reboot, right? Right?

This is the binary:

0000000000000037 <qqq>:
  37:   ee                      out    %al,(%dx)
  38:   eb fd                   jmp    37 <qqq>

Opcode  Instruction Op/ 64-Bit Compat/   Description
                    En  Mode   Leg Mode
EB cb   JMP rel8    A   Valid  Valid     Jump short, RIP = RIP + 8-bit
                                         displacement sign extended
                                         to 64-bits

My belief in the sanity of x86 has been shattered to its core. I don't
feel stupid. I feel cheated.

Thank you for your help.
Laszlo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux