As everyone knows sVirt is our nice solution to isolating guest resources from other (malicious) guests through SELinux labelling of the appropriate files / device nodes. This has been pretty effective since we introduced it to libvirt. In the last year or two, particularly in the cloud arena, there has been a big shift towards use of network based storage. Initially we were relying on kernel drivers / FUSE layers that exposed this network storage as devices or nodes in the host filesystem, so sVirt still stood a chance of being useful if the devices /FUSE layer supported labelling. Now though QEMU has native support for talking to gluster, ceph/rbd, iscsi and even nfs servers. This support is increasingly used in preference to using the kernel drivers / FUSE layers since it provides a simpler and thus (in theory) better performing I/O path for the network storage and does not require any privileged setup tasks on the host ahead of time. The problem is that I beleive this is blowing a decent sized hole in our sVirt isolation story. eg when we launch QEMU with an argument like this: -drive 'file=rbd:pool/image:auth_supported=none:\ mon_host=mon1.example.org\:6321\;mon2.example.org\:6322\;\ mon3.example.org\:6322,if=virtio,format=raw' We are trusting QEMU to only ever access the disk volume 'pool/image'. There are, in all likelihood, many 100's or 1000's of disk images on the server it is connecting to and nothing is stopping QEMU from accessing any of them AFAICT. There is no currrently implemented mechanism by which the sVirt label that QEMU runs under is made available to the remote RBD server to use for enforcement, nor any way in which libvirt could tell the RBD server which label was applied for which disk. The same seems to apply for Gluster, iSCSI, and NFS too when accessed directly from a network client inside the QEMU process. As it stands the only approach I see for isolating each virtual machines disk(s) from other virtual machines is to make use of user authentication with these services. eg each virtual machine would need to have its own dedicated user account on the RBD/Gluster/iSCSI/NFS server, and the disk volumes for the VM would have to be made accessible solely to that user account. Assuming such user account / disk mapping exists in the servers today that can be made to work but it is an incredibly awful solution to deal with when VMs are being dynamically created & deleted very frequently. Today apps like OpenStack just have a single RBD username and password for everything they do. Any virtual machines running with RBD storage on OpenStack thus have no sVirt protection for their disk images AFAICT. To protect images OpenStack would have to dynamically create & delete new user accounts on the RBD server & setup disk access for them. I don't see that kind of approach being viable. IIUC, there is some mechanism at the IP stack level where the kernel can take the SELinux label of the process that establishes the network connection and pass it across to the server. If there was a way in the RBD API for libvirt to label the volumes, then potentially we could have a system where the RBD server did sVirt enforcement, based on the instructions from libvirt & the label of the client process. Thoughts on what to do about this ? Network based storage, where the network client is inside each QEMU server, is here to stay so I don't think we can ignore the problem long term. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list