Joanna Rutkowska wrote:
Anthony Liguori wrote:
Avi Kivity wrote:
No. Paravirtualization just augments the standard hardware interface,
it doesn't replace it as in Xen.
NB, unlike Xen, we can (and do) run qemu as non-root. Things like
RHEV-H and oVirt constrain the qemu process with SELinux.
On Xen you can get rid of the qemu entirely, if you run only PV domains.
Also, you can use qemu to provide the backends to a Xen PV guest (see -M
xenpv). The effect is that you are moving that privileged code from the
kernel (netback/blkback) to userspace (qemu -M xenpv).
In general, KVM tends to keep code in userspace unless absolutely
necessary. That's a fundamental difference from Xen which tends to do
the opposite.
But the difference is that in case of Xen one can *easily* move the
backends to small unprivileged VMs. In that case it doesn't matter the
code is in kernel mode, it's still only in an unprivileged domain.
Right, in KVM, Linux == hypervisor. A process is our "unprivileged
domain". Putting an unprivileged domain within an unprivileged domain
is probably not helpful from a security perspective since the exposure
surface is identical.
Sandboxing a process in a monolithic OS, like Linux, is generally
considered unfeasible, for anything more complex than a hello world
program. The process <-> kernel interface seem to be just too fat. See
e.g. the recent Linux kernel overflows by Spender.
That's the point of mandatory access control. Of course, you need the
right policy and Spender highlighted an issue with the standard RHEL
SELinux policy, but that should be addressed now upstream.
Also, SELinux seems to me like a step into the wrong direction. It not
only adds complexity to the already-too-complex kernel, but requires
complex configuration. See e.g. this paper[1] for a nice example of how
to escape SE-sandboxed qemu on FC8 due to SELinux policy misconfiguration.
When some people tried to add SELinux-like-thing to Xen hypervisor, it
only resulted in an exploitable heap overflow in Xen [2].
It's certainly fair to argue the merits of SELinux as a mandatory access
control mechanism.
Again though, that's the point of MLS. Our first line of defense is
qemu. Our second line of defense is traditional Posix direct access
control. Our third line of defense is namespace isolation (ala lxc).
Our fourth line of defense is mandatory access control (ala SELinux and
AppArmor).
If you take a somewhat standard deployment like RHEV-H, an awful lot of
things have to go wrong before you can successfully exploit the system.
And 5.4 doesn't even implement all of what's possible. If you're really
looking to harden, you can be much more aggressive about privileges and
namespace isolation.
Regards,
Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html