Hi Anthony, Thank you for your comments! On Mon, 2011-11-07 at 11:37 -0600, Anthony Liguori wrote: > On 11/06/2011 02:40 PM, Sasha Levin wrote: > > Hi all, > > > > I'm planning on doing a small fork of the KVM tool to turn it into a > > 'Secure KVM' enabled hypervisor. Now you probably ask yourself, Huh? > > > > The idea was discussed briefly couple of months ago, but never got off > > the ground - which is a shame IMO. > > > > It's easy to explain the problem: If an attacker finds a security hole > > in any of the devices which are exposed to the guest, the attacker would > > be able to either crash the guest, or possibly run code on the host > > itself. > > > > The solution is also simple to explain: Split the devices into different > > processes and use seccomp to sandbox each device into the exact set of > > resources it needs to operate, nothing more and nothing less. > > > > Since I'll be basing it on the KVM tool, which doesn't really emulate > > that many legacy devices, I'll focus first on the virtio family for the > > sake of simplicity (and covering 90% of the options). > > > > This is my basic overview of how I'm planning on implementing the > > initial POC: > > > > 1. First I'll focus on the simple virtio-rng device, it's simple enough > > to allow us to focus on the aspects which are important for the POC > > while still covering most bases (i.e. sandbox to single file > > - /dev/urandom and such). > > > > 2. Do it on a one process per device concept, where for each device > > (notice - not device *type*) requested, a new process which handles it > > will be spawned. > > > > 3. That process will be limited exactly to the resources it needs to > > operate, for example - if we run a virtio-blk device, it would be able > > to access only the image file which it should be using. > > > > 4. Connection between hypervisor and devices will be based on unix > > sockets, this should allow for better separation compared to other > > approaches such as shared memory. > > > > 5. While performance is an aspect, complete isolation is more important. > > Security is primary, performance is secondary. > > > > 6. Share as much code as possible with current implementation of virtio > > devices, make it possible to run virtio devices either like it's being > > done now, or by spawning them as separate processes - the amount of > > specific code for the separate process case should be minimal. > > > > > > Thats all I have for now, comments are *very* welcome. > > I thought about this a bit and have some ideas that may or may not help. > > 1) If you add device save/load support, then it's something you can potentially > use to give yourself quite a bit of flexibility in changing the sandbox. At any > point in run time, you can save the device model's state in the sandbox, destroy > the sandbox, and then build a new sandbox and restore the device to its former > state. > > This might turn out to be very useful in supporting things like device hotplug > and/or memory hot plug. > > 2) I think it's largely possible to implement all device emulation without doing > any dynamic memory allocation. Since memory allocation DoS is something you > have to deal with anyway, I suspect most device emulation already uses a fixed > amount of memory per device. This can potentially dramatically simplify things. > > 3) I think virtio can/should be used as a generic "backend to frontend" > transport between the device model and the tool. virtio requires server and client to have shared memory, so if we already go with shared memory we can just let the device manage the actual virtio driver directly, no? Also, things like interrupts would also require some sort of a different IPC, which would complicate things a bit. > 4) Lack of select() is really challenging. I understand why it's not there > since it can technically be emulated but it seems like a no-risk syscall to > whitelist and it would make programming in a sandbox so much easier. Maybe > Andrea has some comments here? I might be missing something here. There are several of these which would be nice to have, and if we can get seccomp filters we have good flexibility with which APIs we allow for each device. > Regards, > > Anthony Liguori > > > > -- Sasha. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html