On Mon, Nov 7, 2011 at 1:27 PM, Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote: > On Mon, Nov 7, 2011 at 10:17 AM, Sasha Levin <levinsasha928@xxxxxxxxx> wrote: >> Hi Avi, >> >> Thank you for your comments! >> >> Just one question below: >> >> On Mon, Nov 7, 2011 at 11:26 AM, Avi Kivity <avi@xxxxxxxxxx> wrote: >>> Crashing the guest is fine (not 100% - you can have unprivileged code >>> managing a device, in which case we allow unprivileged code to crash the >>> entire guest - but that's rare). Running code on the host is also fine; >> >> On Mon, Nov 7, 2011 at 11:26 AM, Avi Kivity <avi@xxxxxxxxxx> wrote: >>> One thing to beware of is memory hotplug. If the memory map is static, >>> then a fork() once everything is set up (with MAP_SHARED) alllows all >>> processes to access guest memory. However, if memory hotplug is >>> supported (or planned to be supported), then you can't do that, as >>> seccomp doesn't allow you to run mmap() in confined processes. >>> >>> This means they have to use RPC to the main process in order to access >>> memory, which is going to slow them down significantly. >> >> Is the risk of a non-privileged guest code being able to exploit >> hypervisor to access guest memory which it's not allowed to access is >> really that small? I actually thought it would be one of the main >> concerns we'd need to handle, but from what I understand from you it's >> an irrelevant scenario. >> >> If it's really the case, then mapping guest memory is preferable. >> While mmap() is an issue, I think it's a great example of why seccomp >> filters are needed in the kernel, and might be a good chance to push >> that feature forward. In that sense, 'Secure KVM' could be used as a >> guinea pig both for seccomp filters and future QEMU work. > > This is a really interesting topic - something that we've discussed in > QEMU as well. > > Doing it with seccomp is really hard since that only allows read(2), > write(2), exit(2), and sigreturn(2). I think using seccomp means that > host devices (e.g. actual network and block device I/O) are > implemented outside the seccomp because it requires other syscalls. > Then the seccomp process would simply do hardware emulation with IPCs > for all actual I/O. Yup, thats why it might be a good chance to explore into seccomp filters. Being able to filter not just calls, but also some parameters of the calls will allow us to tailor a pretty well defined wrapper for each and every device. > > Where does the VNC server, the image formats, etc go? It would be > nice to confine them too. Regarding image formats, just wondering - was there ever any plan to merge (at least some of them) into the kernel? > In that respect I think Avi's ideas about using safe programming > languages (even if just a NaCl toolchain) are nice because they are > more general and apply to all of the codebase. > > Stefan > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html