Re: [PATCH] KVM: nVMX: VMX instructions: fix segment checks when L1 is in long mode.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 24, 2016 at 03:10:03PM +0200, Paolo Bonzini wrote:
> On 24/06/2016 15:04, Quentin Casasnovas wrote:
> > On Thu, Jun 23, 2016 at 06:03:01PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 18/06/2016 11:01, Quentin Casasnovas wrote:
> >>> Cross-checking the KVM/VMX VMREAD emulation code with the Intel Software
> >>> Developper Manual Volume 3C - "VMREAD - Read Field from Virtual-Machine
> >>> Control Structure", I found that we're enforcing that the destination
> >>> operand is NOT located in a read-only data segment or any code segment when
> >>> the L1 is in long mode - BUT that check should only happen when it is in
> >>> protected mode.
> >>>
> >>> Shuffling the code a bit to make our emulation follow the specification
> >>> allows me to boot a Xen dom0 in a nested KVM and start HVM L2 guests
> >>> without problems.
> >>
> >> That's great, and I'm applying the patch, but it's also pretty weird. :)
> >>  Do you have a pointer to Xen source code that does a VMREAD into a
> >> read-only data segment or a code segment?
> > 
> > It is indeed pretty weird.  Looking at the Xen stack trace, it looks like
> > the vmread is writing to an on-stack buffer, and surely it must be writable
> > so I wonder if Xen might not be using an executable stack for some reason?
> > That would be a bit scary so I'm surely missing something.
> > 
> > Is there an easy way to know from my KVM host the different segment
> > permission setup by the guest?
> 
> Remove your patch, call dump_vmcs() where the #GP is injected, and
> you'll find the VMCS (including segment permissions, but not the
> instruction info field---you probably should add it) in dmesg.
> 

Thanks for the heads up :)

I've had a bit more time to spend on this this morning and attached is the
VMCS dump.  I've look at the vmcs_instruction_info and it appears the
segment referenced is SS (which is in sync with the backtrace where the
instruction causing the vmexit is "vmread %rbp, %rbp), and it has awkward
attributes:

  SS:   sel=0x0000, attr=0x1c000, limit=0xffffffff, base=0x0000000000000000

The lower 16 bits are all zero so KVM VMX emulation was injecting the GP(0)
because we were about to write to a read-only segment.  At least the stack
isn't executable from what I can tell!

Attached is the full VMCS dump where I've added a printk() to show the
'type' (all zeroes) and vmcs_instruction_info in case my above analysis is
complete non-sense.

Quentin
[ 9853.506447] kvm: wr: read-only segment type==0, info=e2614920
[ 9853.506464] *** Guest State ***
[ 9853.506466] CR0: actual=0x0000000080050033, shadow=0x0000000080050033, gh_mask=fffffffffffffff7
[ 9853.506467] CR4: actual=0x00000000001526e0, shadow=0x00000000001526e0, gh_mask=fffffffffffff871
[ 9853.506467] CR3 = 0x000000007aa37000
[ 9853.506468] RSP = 0xffff83007b73fab0  RIP = 0xffff82d0801e629e
[ 9853.506469] RFLAGS=0x00000202         DR7 = 0x0000000000000400
[ 9853.506470] Sysenter RSP=ffff83007b73ffc0 CS:RIP=e008:ffff82d08022c480
[ 9853.506471] CS:   sel=0xe008, attr=0x0a09b, limit=0xffffffff, base=0x0000000000000000
[ 9853.506472] DS:   sel=0x0000, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
[ 9853.506473] SS:   sel=0x0000, attr=0x1c000, limit=0xffffffff, base=0x0000000000000000
[ 9853.506474] ES:   sel=0x0000, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
[ 9853.506475] FS:   sel=0x0000, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
[ 9853.506476] GS:   sel=0x0000, attr=0x0c093, limit=0xffffffff, base=0x0000000000000000
[ 9853.506477] GDTR:                           limit=0x0000efff, base=0xffff83007b4d7000
[ 9853.506478] LDTR: sel=0x0000, attr=0x1c000, limit=0xffffffff, base=0x0000000000000000
[ 9853.506479] IDTR:                           limit=0x00000fff, base=0xffff83007b4e3000
[ 9853.506480] TR:   sel=0xe040, attr=0x0008b, limit=0x00000067, base=0xffff83007b4e6c80
[ 9853.506481] EFER =     0x0000000000000d00  PAT = 0x0000050100070406
[ 9853.506481] DebugCtl = 0x0000000000000000  DebugExceptions = 0x0000000000000000
[ 9853.506482] Interruptibility = 00000000  ActivityState = 00000000
[ 9853.506483] *** Host State ***
[ 9853.506484] RIP = 0xffffffffa00f6daf  RSP = 0xffff880131aafd00
[ 9853.506485] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040
[ 9853.506486] FSBase=00007fbf6bfff700 GSBase=ffff88021e240000 TRBase=ffff88021e253b40
[ 9853.506486] GDTBase=ffff88021e249000 IDTBase=ffffffffff57b000
[ 9853.506487] CR0=0000000080050033 CR3=0000000004b21000 CR4=00000000001426e0
[ 9853.506488] Sysenter RSP=0000000000000000 CS:RIP=0010:ffffffff81a02740
[ 9853.506489] EFER = 0x0000000000000d01  PAT = 0x0407010600070106
[ 9853.506490] *** Control State ***
[ 9853.506491] PinBased=0000003f CPUBased=b6a06dfa SecondaryExec=000000eb
[ 9853.506491] EntryControls=0000d3ff ExitControls=002fefff
[ 9853.506492] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000
[ 9853.506493] VMEntry: intr_info=000000fc errcode=00000000 ilen=00000000
[ 9853.506494] VMExit: intr_info=00000000 errcode=00000000 ilen=00000006
[ 9853.506495]         reason=00000017 qualification=0000000000000008
[ 9853.506495] IDTVectoring: info=00000000 errcode=00000000
[ 9853.506496] TSC Offset = 0xffffe8cdfc3ca592
[ 9853.506497] TPR Threshold = 0x00
[ 9853.506497] EPT pointer = 0x000000000467f01e
[ 9853.506498] Virtual processor ID = 0x0007

[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]