Re: Secure KVM

Anthony Liguori <anthony@xxxxxxxxxxxxx> · Mon, 07 Nov 2011 11:39:30 -0600

On 11/07/2011 03:26 AM, Avi Kivity wrote:
On 11/06/2011 10:40 PM, Sasha Levin wrote:
Hi all,

I'm planning on doing a small fork of the KVM tool to turn it into a
'Secure KVM' enabled hypervisor. Now you probably ask yourself, Huh?

Actually, no.

The idea was discussed briefly couple of months ago, but never got off
the ground - which is a shame IMO.

It's easy to explain the problem: If an attacker finds a security hole
in any of the devices which are exposed to the guest, the attacker would
be able to either crash the guest, or possibly run code on the host
itself.

Crashing the guest is fine (not 100% - you can have unprivileged code
managing a device, in which case we allow unprivileged code to crash the
entire guest - but that's rare).  Running code on the host is also fine;
we have a permissions system in place to prevent damage; see libvirt's
sVirt code, which uses selinux to disallow an exploited guest from
touching other guests or host data.  It should be able to protect
host-only networks as well (not sure if it does that).

The real risk is that the exploited hypervisor turns around and exploits
yet another hole in the system, like a privileged daemon that the
hypervisor is allowed to be in contact with, or the kernel itself, via a
vulnerability in the kernel interfaces.

The solution is also simple to explain: Split the devices into different
processes and use seccomp to sandbox each device into the exact set of
resources it needs to operate, nothing more and nothing less.

One thing to beware of is memory hotplug.  If the memory map is static,
then a fork() once everything is set up (with MAP_SHARED) alllows all
processes to access guest memory.  However, if memory hotplug is
supported (or planned to be supported), then you can't do that, as
seccomp doesn't allow you to run mmap() in confined processes.

This means they have to use RPC to the main process in order to access
memory, which is going to slow them down significantly.

If you treat the sandbox as ephemeral by leveraging save/restore, you can throw 
away and rebuild the device model on every memory change.  While not a super 
cheap operation, it's at least amortized over time.

Regards,

Anthony Liguori

Since I'll be basing it on the KVM tool, which doesn't really emulate
that many legacy devices, I'll focus first on the virtio family for the
sake of simplicity (and covering 90% of the options).

Since virtio is so performance sensitive, my feeling is that it is
better to audit it, and rely on sandboxing for the non performance
sensitive parts of the device model.  Of course for a POC it's fine to
start with it.

This is my basic overview of how I'm planning on implementing the
initial POC:

<snip plan>

Thats all I have for now, comments are *very* welcome.

This plan is quite similar to the equivalent plans for qemu.  However,
as kvm-tool is much smaller than qemu, you're likely to have much easier
time and make much faster progress.  This is really a great use of
kvm-tool, to explore new ideas rather than catching up; and I'm sure
your experience will prove useful for qemu as well.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html