On Wed, Aug 24, 2011 at 3:46 PM, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote: > On Wed, Aug 24, 2011 at 03:20:57PM +0100, Stefan Hajnoczi wrote: >> On Tue, Aug 23, 2011 at 4:31 PM, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote: >> > On Tue, Aug 23, 2011 at 04:24:46PM +0100, Stefan Hajnoczi wrote: >> >> On Tue, Aug 23, 2011 at 12:15 PM, Daniel P. Berrange >> >> <berrange@xxxxxxxxxx> wrote: >> >> > I was at the KVM Forum / LinuxCon last week and there were many >> >> > interesting things discussed which are relevant to ongoing libvirt >> >> > development. Here was the list that caught my attention. If I have >> >> > missed any, fill in the gaps.... >> >> > >> >> > - Sandbox/container KVM. The Solaris port of KVM puts QEMU inside >> >> > a zone so that an exploit of QEMU can't escape into the full OS. >> >> > Containers are Linux's parallel of Zones, and while not nearly as >> >> > secure yet, it would still be worth using more containers support >> >> > to confine QEMU. >> >> >> >> Can you elaborate on why Linux containers are "not nearly as secure" >> >> [as Solaris Zones]? >> > >> > Mostly because the Linux namespace functionality is far from complete, >> > notably lacking proper UID/GID/capability separation, and UID/GID >> > virtualization wrt filesystems. The longer answer is here: >> > >> > https://wiki.ubuntu.com/UserNamespace >> > >> > So at this time you can't build a secure container on Linux, relying >> > just on DAC alone. You have to add in a MAC layer ontop of the container >> > to get full security benefits, which obviously defeats the point of >> > using the container as a backup for failure in the MAC layer. >> >> Thanks, that is interesting. I still don't understand why that is a >> problem. Linux containers (lxc) uses a different pid namespace (no >> ptrace worries), file system root (restricted to a subdirectory tree), >> forbids most device nodes, etc. Why does the user namespace matter >> for security in this case? > > A number of reasons really... > > If user ID '0' on the host starts a container, and a process inside > the container does 'setuid(500)', then any user outside the container > with UID 500 will be able to kill that process. Only user ID '0' should > have been allowed todo that. > > It will also let non-root user IDs on the host OS, start containers > and have root uid=0 inside the container. > > Finally, any files created inside the container with, say, uid 500 > will be accessible by any other process with UID 500, in either the > host or any other container These points mean that the host can peek inside containers and has access to their processes/files. But from the point of a libvirt running inside a container there is no security problem. This is kind of like saying that root on the host can modify KVM guest disk images. That is true but I don't see it as a security problem because the root on the host is the trusted part of the system. >> I think it matters when giving multiple containers access to the same >> file system. Is that what you'd like to do for libvirt? > > Each container would have to share a (readonly) view onto the host > filesystem so it can see the QEMU emulator install / libraries. There > would also have to be some writable areas per QEMU container. QEMU > inside the container would be set to run as some non-root UID (from > the container's POV). So both problem 1 & 3 above would impact the > security of this confinement. But is there a way to escape confinement? If not, then this is secure. Stefan -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list