-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/06/2011 10:41 AM, Daniel P. Berrange wrote: > What follows is a document outlining some thoughts I've been having > on extending sVirt to allow confinement of applications which talk > to libvirtd on the host, primarily focusing on use of SELinux, but > also allowing a simple non-SElinux RBAC mechanism. > > Securing KVM virtualization hosts with MAC > ========================================== > > This document looks at the task of securing KVM virtualizaton > hosts using mandatory access control technologies, with focus > on SELinux. At the time of writing there have been two phases > of development, and this document makes proposals for a third > phase. > > Phase 1: circa 2006 > ------------------- > > Goal: Protect the host from a compromised virtual machine. > > The first phase of development had the modest goal of > protecting the host from attack by a compromised virtual > machine. To achieve this, the KVM processes are configured > such that they will run under a confined security context > ('virt_t' in the SELinux reference policy), which blocks > access to any host resources not labelled ('virt_image_t') > for use by virtual machines. > > The primary limitations of this initial implementation > is that while the virtual host is secured, there is no > protection between virtual machines. This can be considered > a regression in isolation as compared to that offered by > non-virtualized hosts. The second limitation is that the > virtualization admin has to take care to ensure the host > resources intended for use by the virtual machines are > correctly labelled. This is a manual setup taks unless > the images are kept in a preset location (/var/lib/libvirt/images > in the SELinux reference policy). > > > > Phase 2: March 2009 > ------------------- > > Goal: Protect virtual machines from each other > > The second phase of development has the goal of providing > isolation between virtual machines that is comparable to > that achieved between physical machines. This piece of > work is commonly referred to as "svirt". The achieve this, > the KVM processes are each configured to run under a > dedicated security context, which blocks access to any > resources not explicitly assigned to that virtual machine. > In the SELinux implementation, the base context "svirt_t" > has a unique MCS category ("c240,c955") appended to form > a unique security context "system_u:system_r:svirt_t:s0:c240,c955". > For each host resource to be assigned to the virtual machine, > the base context "svirt_image_t" is combined with the same > MCS category to form a unique resource security context > "system_u:object_r:svirt_image_t:s0:c240,c955". > > The assignment of virtual machine security contexts and > labelling of resources can be done statically by the > administrator / management application, or dynamically > by the libvirtd daemon. The latter removes much of the > administrator burden. > > The second phase has addressed the major guest security > limitation of the first phase, and eased the burden placed > on host administors. Attention can now focus on the security > of the host management software stack. Client applications > communicate with the libvirtd daemon using a simple sockets > based RPC protocol. Thus operations initiated by client > applications which run under one security context are in > fact invoked under the libvirtd daemon's security context. > Since the libvirtd daemon is a highly privileged, almost > unconfined process, this provides a means for applications > to elevate their privileges. > > A second problem with the current model is seen when looking > at guest migration between hosts. During migration, there > are two QEMU processes running for the same virtual machine, > one process on each host. The dynamic assignment of MCS > values to form unique security contexts is done on a per host > basis, so there is no guarantee that the VM on host A will be > using (or be able to use) the same security context on the > target host of migration. This is not neccessarily a problem > if the guest is using block devices, since block device inode > labels are only visible to a single host. With a shared > filesystem that supports SELinux labelling, like GFS2, both > QEMU processes must run in the same security context to allow > them both to access the associated files. > > > Phase 3: June 2011 > ------------------ > > Goal: Protect virtual machines from host applications > > The third phase of development has the primary goal of > honouring the confinement of client applications talking > to libvirtd, when performing operations on virtual machines > and other managed objects (storage pools, host devices, > virtual networks, secrets, etc). Every application connecting > to libvirt has an associated security context. Every object > managed by libvirtd will have an associated security context. > When an operation is invoked via a libvirt API the client > application security context will be checked against the > target object context, before proceeding. Thus applications > will not be able to make use of a libvirtd connection to > perform operations that are otherwise blocked. > > The secondary goal is to add further flexibility and safety > to the way MCS categories are assigned, and files are relabelled. > Instead of maintaining a local database of assigned labels, there > must be some shared storage where label usage can be recorded. > At its simplest this can be an NFS share, with one file per MCS > category and locking with fcntl(). An alternative would to be > acquire leases using a lock manager such as sanlock. In addition, > the guest configuration will be enhanced such that a guest can > be assigned a statically chosen security context, but still make > use of dynamic relabelling of resources. Finally the existing > boolean mode of 'static' vs 'dynmamic' label generation will be > turned into a tri-state, introducing a 'hybrid' mode where the > client supplies a custom base context, and the MCS part is still > auto-generated. > > > Usage scenarios > --------------- > > To aid in development a couple of relevant core use cases > or usage scenarios have been identified: > > 1. A virtual machine monitoring application > > For this example, consider the simple monitoring application > 'virt-top'. This application displays a list of all virtual > machines on the host and their associated resource utilization > (CPU, disk, network). This application has no need to be able > to stop/start/define virtual machines, nor do any operation > related to host devices, storage, or networking. Traditionally > this application is written to use a read only libvirt connection. > > With enhanced access control from libvirtd, the policy would define > a new security context 'virt_top_t' for the 'virt-top' application. > This policy would allow 'list', 'read', 'readstats' on the 'domain' > object type. > > > 2. A multi-guest, multi-user MLS enabled host > > For this example, consider a virtualizaton host with MLS policy > that is running multiple virtual machines, for a variety of > different users. A user with the security level "restricted" > must not be allowed to control virtual machines with a security > level of "confidential". Conversely a user with security level > "secret" must not be allowed to create virtual machines with a > security level of "unclassified". > > With enhanced access control from libvirtd, getpeercon() would > provide the security context of the client application (user). > The client context would be used to perform an AVC when any API > operation is invoked, thus ensuring that the client's MLS > label is honoured in access control checks. The effect would be > that when an 'restricted' user asked for a list of virtual machines > only virtual machines at level 'restricted' or below would be > returned. Or when a "secret" user asked to start a guest when > a security level of 'unclassified', the operation would be denied. > > > 3. Identity transitions from trusted agents > > For this example, consider a trusted agent such as libvirt-qpid, > or libvirt-snmp, which translates the libvirt API from its native > model, into an alternate access model. In such an example, the > agent talking to libvirtd will have authenticated itself. The > peer identity that libvirtd sees, however, is that of the agent, > not the ultimate (end-user) client. In such a case it will desirable > to allow a trusted agent to transition to a different identity when > performing operations. > > An end user running under context "unconfined_u:unconfined_r:virt_top_t:s0-s0:c0.c1023" > may talk to the libvirt-qpid agent which runs under the context > "system_u:system_r:virt_qpid_t:s0-s0:c0.c1023". The libvirt-qpid > connects to libvirtd which sees 'virt_qpid_t' as the client type. > The policy is written to allow transitions from 'virt_qpid_t' to > the 'virt_top_t' type, so when the virt-top client connects to > libvirt-qpid, it changes its identity to 'virt_top_t'. From that > point onwards, all AVC checks honour the privileges of the ultimate > end user application, rather than the libvirt-qpid intermediary. > The same mechanism also ensures that the client application MLS > level is transferred via the libvirt-qpid agent to libvirtd. > > > Anticipated Development tasks > ----------------------------- > > 1. Extend the domain XML to add a third attribute to the <seclabel> > element relabel="yes|no", to control whether libvirtd will > automatically label resources assigned to a guest. If the > existing 'mode' attribute is "dynamic", then relabelling will > default to enabled, while if it is 'static', then relabelling > will default to disabled. Also change 'mode' to allow a new > 'hybrid' value. > > 2. Determine how to maintain/identify security labels for other > managed objects, including virStoragePoolPtr, virStorageVolPtr, > virSecretPtr, virNetworkPtr, virInterfacePtr, virNodeDevicePtr, > an host level APIs without any explicit managed object. > > 3. Extend XML for non-domain objects to implant security labels > as identified in step 2. > > 4. Create an internal virIdentity struct to store the identity > of the client. This will include at least the x509 distinguished > name, the SASL username, the SELinux context (getpeercon()) > and UNIX username/group (SCM_CREDENTIALS). > > 5. Create a new public API to allow a client application to > supply a new identity, allowing them to pass a new x509 > distinguished name, SASL username, SELinux context and > UNIX username/group. > > 6. Extend the libvirtd daemon such that the current identity > is stored in a thread local whenever invoking a public > API operation. > > 7. Extend the QEMU driver such that a suitable identity is > set when performing autonomous background operations > such as domain auto-start and core dump, in a non-API > thread. > > 8. Create a set of internal access control helper APIs in > $libvirt/src/accesscontrol/. There will be one API for each > managed object, talking an object pointer, and an operation > identifier (from an enum). > > 9. Create a simple impl of the access control APIs which defines > roles for groups of user identities, and grants privileges to > each role based on the operation names. This allows for simple > testing of internal infrastructure, and an RBAC mechanism for > users who lack SELinux in their OS. > > 10. Implant access control checks into the main codepaths of every > driver method implementations in the QEMU driver. > > 11. Change the SELinux reference policy to define the new security > types and access vectors for the libvirt objects & associated > API calls. > > 12. Create a SELinux impl of the access control APIs which invokes > avc_has_perm() using the client's SELinux context. This is > intended to be the primary RBAC mechanism for Fedora/RHEL > virtualization hosts. > > 13. Write policy to confine targetted applications like virt-top, > virt-mem. > > 14. Extend libvirt-snmp, libvirt-cim, libvirt-qpid to pass through > the client identity to libvirtd. > > > Technical Notes / Issues > ------------------------ > > 1. Adding new SELinux security classes / access vectors > > The selinux security classes are defined in /usr/include/selinux/flask.h > and access vectors in /usr/include/selinux/av_permissions.h Both of these > files are automatically by a script in the selinux reference policy code > '$serefpolicy/policy/flask/flask.py'. The master data files are in the > same directory, 'access_vectors' and 'security_classes'. Once generated, > the headers need to be manually copied into the libselinux package > sources. > You do not need to do this anymore. libselinux does not care about the access vectors, they are named in your application.Well > > APIs are added to libvirt on a very frequent basis. What is the process > for applying access control to them if the SELinux policy does not yet > have a suitable access vector / security class defined ? Do we need a > generic 'admin' access vector we can use as catch all, until more > specific vectors can be defined for the new APIs. Desirable to avoid > having to lock-step upgrade libvirt with selinux policy for all additions > to the libvirt public API. > Well one benefit would be unconfined_t, although I am not sure it would have access. > > 2. Security contexts for libvirt managed objects > > virDomainPtr: Already embedded in XML, unless using dynamic labelling > in which case context is assigned at startup. > > virNetworkPtr: No existing security context, nor any object on disk > that could be used. Follow example of domains and embed > <seclabel> in the XML. Assign unique MCS category per > network and ensure that daemons launched per network > (dnsmasq, radvd) inherit the MCS category. > > virSecretPtr: No existing security context. Secrets may be associated > with disk paths for VMs. Could copy the security context > of the guests and apply it to the secret, or have a > dedicated type svirt_secret_t and just copy the MCS > category. Hard to make it work for guests with dynamic > MCS assignment. > > virStoragePoolPtr: No existing security context. Some pool types have > objects existing on the host filesystem eg SCSI > HBAs have a directory in sysfs, filesystem dirs > have a directory somewhere, LVM has directory > for the volume group in /dev. Other pool types have > no object on disk anywhere convenient. eg Sheepdog. > Other pool types only have an object on disk when > the pool is active (eg iSCSI, NFS). So there is > nothing to use for API checks when the pool is > inactive. > > Likely have to ignore whatever associated resource > is on disk and just store a security context in the > XML config as with virDomainPtr/virNetworkPtr. > > > virStorageVolPtr: Currently reports the SELinux security label associated > with the file on disk. Not all pool types neccessarily > have volumes with a corresponding file on disks (eg > Sheepdog). > > virNodeDevicePtr: No existing security context. Most data comes from udev > or HAL databases, though ultimately much is available > in sysfs. > > When detaching PCI devices from host drivers, files > in sysfs are used. When creating/deleting NPIV adapters > sysfs is used. Thus could use sysfs file labels for AVC > checks ? > > virConnectPtr: All host level APIs for which there is no other object > aside from the nebulous concept of the 'host'. APIs are > all readonly, eg query host capabilities, query free > memory, CPU stats, etc. What if we gain APIs to make > write calls. > > > virInterfacePtr: No existing security context. Currently using netcf to > get data from /etc/sysconfig/network-scripts/ifcfg-XXX > files, but can't assume those file names since that is > Fedora/RHEL specific. Might not even use netcf if it > talks directly to network manager. Does netcf need to > expose a security label based on the ifcfg-XXX file ? > > > 3. Security labelling config modes > > When creating a guest the following XML snippets can be used. > > a. Default type, dynamic MCS, automatic relabelling > > <seclabel type='selinux' mode='dynamic' relabel='yes'/> > > > b. Custom type, dynamic MCS, automatic relabelling > > <seclabel type='selinux' mode='hybrid' relabel='yes'> > <label>system_u:system_r:mysvirt_t</label> > <imagelabel>system_u:object_r:mysvirt_image_t</imagelabel> > </seclabel> > Yes this would be cool, although I am not sure you need an image label, since the MCS separation would still work on svirt_image_t. Would make policy writing easier and selection easier if you did not change the type of the image file. I would at least allow for the admin to not specify a image label. > > c. Default type, dynamic MCS, no relabelling > > <seclabel type='selinux' mode='dynamic' relabel='no'/> > > Does this mode make any sense, since admin doesn't know > MCS category upfront ? Possibly only useful if the guest > only has readonly disks. > But you don't relabel on readonly correct, since this is a shared resource. I would say this would not be used. > > d. Custom type, dynamic MCS, no relabelling > > <seclabel type='selinux' mode='hybrid' relabel='no'> > <label>system_u:system_r:mysvirt_t</label> > </seclabel> > > Same question about whether it makes sense > I don't think this makes sense. > > e. Custom type, static MCS, auto relabelling > > <seclabel type='selinux' mode='static' relabel='yes'> > <label>system_u:system_r:mysvirt_t:s0:c123,c456</label> > <imagelabel>system_u:system_r:mysvirt_image_t:s0:c123,c456</imagelabel> > </seclabel> > > This is fine, not sure it is legal in MLS world. Although I guess we could change the label to SystemHigh when not in use. > f. Custom type, static MCS, no relabelling > > <seclabel type='selinux' mode='static' relabel='no'> > <label>system_u:system_r:mysvirt_t:s0:c123,c456</label> > </seclabel> > > We have this now, this is static labeling. > 4. Time at which to apply checks / source context > > It would be desirable to restrict the ability to use automatic file > relabelling within the policy. If a client application defines a > guest with the 'relabel=yes' attribute set, at what time should this > usage be validated ? > > Validate at the time the guest is defined ? This ensures the app > defining the guest is suitably privileged, but the file labels > might be changed by the time the guest starts. > > Validate at the time the guest is started ? This minimises the > window between access check being performed, and libvirtd actually > performing the relabel operation. The app starting the guest might > be different from the one defining the guest though ? > > Check at both define + start time ? > > Probably most sane. > What source security context should we use when performing autostart > of virtual machines ? Normally when starting a VM, the check would be > performed using the context of the client invoking the start API, but > there is no such client when autostart occurs. > libselinux default. > Should we instead perform a 'start' operation check whenever the > 'autostart' flag is turned on by a client ? Or check the autostart > operation against some generic source context ? > > I think we leave this in the default_context file. One last thing to think about is since libvirt can now be run under the users context, in certain situations, libvirt should examine the range of MLS/MCS labels associated with it and make sure that it can only assign MCS labels within this range. For example if I am a user running as staff_t:s0-s0:c500 libvirt should only pick random labels between 0-500. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAk3tIaIACgkQrlYvE4MpobMc3ACfcDqjO+dns9V+zGr1l1h0qbNe jcsAoMuSheEzYSKWbPd0/9zr+zn6PndG =SttH -----END PGP SIGNATURE----- -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list