Re: KVM is type 1 hypervisor, but...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 05/06/2017 16:39, Sylvain Leroux wrote:
> COMPUTER ARCHITECTURAL SYSTEMSPRINCIPLES FOR VIRTUAL
> Robert P. Goldberg -- pp20 & following
> http://www.dtic.mil/dtic/tr/fulltext/u2/772809.pdf

Thanks, that was useful.

The definition does not mention a conventional OS, but rather an
"extended host".  From page 22:

  Type I--The VMM runs on a bare machine.
  Type II--The VMM runs on an extended host [53,75], under the
  host operating system.

Page 99 extends this a bit: "Type II (extended machine host) virtual
machine organization is one in which the VMM runs, not on a bare machine
as the suoervisor, but rather on an extended host under the host
supervisor".

"Extended host" is not really an expression that is used anymore, but it
is defined in page 30 of the thesis:

  A pseudo-machine [99], also called an extended machine [53] or a user
  machine [75], is a composite machine produced through a combination of
  harware and software, in which the machine's apparent architcture has
  been changed slightly to make the machine more convenient to use.
  Typicallyg these architectural changes have taken the form of removing
  I/O channels and devices, and adding system calls to perform I/O and
  and other operations.

Note how [53,75] are the same citations at page 22; this is exactly what
I mentioned earlier: "type I" hypervisors (Goldberg says VMMs) run in
supervisor mode, "type II" VMMs run in user mode.

>> If you really want to cut hypervisors in two, you could distinguish
>> "type-1" hypervisors that run in supervisor mode (x86 says ring 0) from
>> "type-2" hypervisors that run in user mode (x86 says ring 3).
>
> I wonder if there is some tool allowing to measure the amount of time a
> process spend in each ring?

You don't need that.  You just need to know that x86 is not even
virtualizable from ring 3, all you can do is emulation (e.g. dynamic
binary translation).

So, QEMU running in dynamic binary translation mode is a type 2 VMM.

KVM runs on a bare machine (kernel mode, we'd say today), but it
delegates some services (I/O emulation) to a less privileged component
(QEMU).  This is not a type 2 VMM, it is a type 1 VMM that follows
security principles such as privilege separation.

>> Another case where the distinction is substantially blurred by computers
>> and OSes newer than the 1970s is I/O devices.  In this case, VFIO allows
>> I/O devices to be used directly by the virtual machine with no overhead
>> for I/O calls, and together with KVM no overhead for interrupts either.
>>
>> In other words, kernel modules like KVM or Apple's Hypervisor.framework
>> augment conventional OSes with the abilities of a VMM, but KVM and
>> Hypervisor.framework (and VirtualBox too) are definitely "bare metal".
> This is a little bit outside of the scope of my initial question, but
> isn't "augment[ing] conventional OSes with the abilities of a VMM"
> actually increasing the thread surface of the system?

s/thread/threat/g (or threaten depending on the grammar), I guess :)

> Let me make the devil's advocate here: let's imagine I run BSD & Linux
> guests on x86 host. With VMWare ESX, a bug in the Linux Kernel would
> thread *only* the Linux guests. But with KVM (and Xen, for what it
> worth), it would thread *all* guests.

No, only if the guests can jump out of the hypervisor.

If you have code execution in the hypervisor, the game is over.  Because
the hypervisor (Xen, KVM, ESX) runs in ring 0, there is no need to use a
Linux vulnerability to extend your privilege further.

If you have code execution in QEMU, you could reuse the same
vulnerability to gain code execution in the host and attack all guests.
Then we're in something like this scenario:

> Or is there some way in KVM to protect the VMM sub-system from other
> parts of the kernel (esp. from modules/device drivers)?

Not exactly, but there are ways in Linux to protect other parts of the
kernel from QEMU.  The most common one is "sVirt", which is basically a
set of SELinux policy and conventions that let you run virtual machines
in a highly confined process, that can only access resources destined to
the virtual machine.

Paolo



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux