--- docs/internals-locking.html.in | 301 ++++++++++++++++++++++++++++++++++++++++ 1 files changed, 301 insertions(+), 0 deletions(-) create mode 100644 docs/internals-locking.html.in diff --git a/docs/internals-locking.html.in b/docs/internals-locking.html.in new file mode 100644 index 0000000..90054f0 --- /dev/null +++ b/docs/internals-locking.html.in @@ -0,0 +1,301 @@ +<html> + <body> + <h1>Resource Lock Manager</h1> + + <ul id="toc"></ul> + + <p> + This page describes the design of the resource lock manager + that is used for locking disk images with the QEMU driver. + </p> + + <h2><a name="goals">Goals</a></h2> + + <p> + The high level goal is to prevent the same disk image being + used by more than one QEMU instance at a time (unless the + disk is marked as sharable, or readonly). The scenarios + to be prevented are thus: + </p> + + <ol> + <li> + Two different guests running configured to point at the + same disk image. + </li> + <li> + One guest being started more than once on two different + machines due to admin mistake + </li> + <li> + One guest being started more than once on a single machine + due to libvirt driver bug on aa single machine. + </li> + </ol> + + <h2><a name="requirement">Requirements</a></h2> + + <p> + The high level goal leads to a set of requirements + for the lock manager design + </p> + + <ol> + <li> + A lock must be held on a disk whenever a QEMU process + has the disk open + </li> + <li> + The lock scheme must allow QEMU to be configured with + readonly, shared write, or exclusive writable disks + </li> + <li> + A lock must be held on a disk whenever libvirtd makes + changes to user/group ownership and SELinux labelling. + </li> + <li> + At least one locking impl must allow use of libvirtd on + a single host without any admin config tasks + </li> + <li> + A lock handover must be performed during the migration + process where 2 QEMU processes will have the same disk + open concurrently. + </li> + <li> + The lock manager must be able to identify and kill the + process accessing the resource if the lock is revoked. + </li> + </ol> + + <h2><a name="design">Design</a></h2> + + <p> + The requirements call for a design with two distinct lockspaces: + </p> + + <ol> + <li> + The <strong>primary lockspace</strong> is used to protect the content of + disk images. This will honour the disk sharing modes to + allow readonly/shared disk to be assigned to multiple + guests concurrently. + </li> + <li> + The <strong>secondary lockspace</strong> is used to protect the metadata + of disk images. This lock will be held whenever file + permissions / ownership / attributes are changed, and + is always exclusive, regardless of sharing mode. The + primary lock will be held prior to obtaining the secondary + lock. + </li> + </ol> + + <p> + Within each lockspace the following operations will need to be + supported + </p> + + <ul> + <li> + <strong>Acquire object lock</strong> + Acquire locks on all resources initially + registered against an object + </li> + <li> + <strong>Release object lock</strong> + Release locks on all resources currently + registered against an object + </li> + <li> + <strong>Associate object lock</strong> + Associate the current process with an existing + set of locks for an object + </li> + <li> + <strong>Deassociate object lock</strong> + Deassociate the current process with an + existing set of locks for an object. + </li> + <li> + <strong>Register resource</strong> + Register an initial resource against an object + </li> + <li> + <strong>Get object lock state</strong> + Obtain an representation of the current object + lock state. + </li> + <li> + <strong>Acquire a resource lock</strong> + Register and acquire a lock for a resource + to be added to a locked object. + </li> + <li> + <strong>Release a resource lock</strong> + Dereigster and release a lock for a resource + to be removed from a lock object + </li> + </ul> + + <h2><a name="impl">Plugin Implementations</a></h2> + + <p> + Lock manager implementations are provided as LGPLv2+ + licensed, dlopen()able library modules. A different + lock manager implementation may be used + for the primary and secondary lockspaces. With the + QEMU driver, these can be configured via the + <code>/etc/libvirt/qemu.conf</code> configuration + file by specifying the lock manager name. + </p> + + <pre> + contentLockManager="fcntl" + metadataLockManager="fcntl" + </pre> + + <p> + Lock manager implmentations are free to support + both content and metadata locks, however, if the + plugin author is only able to handle one lockspace, + the other can be delegated to the standard fcntl + lock manager. The QEMU driver will load the lock + manager plugin binaries from the following location + </p> + + <pre> +/usr/{lib,lib64}/libvirt/lock_manager/$NAME.so +</pre> + + <p> + The lock manager plugin must export a single ELF + symbol named <code>virLockDriverImpl</code>, which is + a static instance of the <code>virLockDriver</code> + struct. The struct is defined in the header file + </p> + + <pre> + #include <libvirt/plugins/lock_manager.h> + </pre> + + <p> + All callbacks in the struct must be initialized + to non-NULL pointers. The semantics of each + callback are defined in the API docs embedded + in the previously mentioned header file + </p> + + <h2><a name="usagePatterns">Lock usage patterns</a></h2> + + <p> + The following psuedo code illustrates the common + patterns of operations invoked on the lock + manager plugin callbacks. + </p> + + <h3><a name="usageLockAcquire">Lock acquisition</a></h3> + + <p> + Lock acquisition will always be performed from the + process that is to own the lock. This is typically + the QEMU child process, in between the fork+exec + pairing, but it may occassionally be held directly + by libvirtd. + </p> + + <pre> + mgr = virLockManagerNew(lockPlugin, + VIR_LOCK_MANAGER_MODE_CONTENT, + VIR_LOCK_MANAGER_TYPE_DOMAIN); + virLockManagerSetParameter(mgr, "uuid", $uuid); + virLockManagerSetParameter(mgr, "name", $name); + + foreach (initial disks) + virLockManagerAddResource(mgr, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + $path, $flags); + + if (virLockManagerAcquireObject(mgr) < 0) + ...abort... + </pre> + + <p> + The lock is implicitly released when the process + that acquired it exits, however, a process may + voluntarily give up the lock by running + </p> + + <pre> + virLockManagerReleaseObject(mgr); + </pre> + + <h3><a name="usageLockAttach">Lock attachment</a></h3> + + <p> + Any time a process needs todo work on behalf of + another process that holds a lock, it will associate + itself with the existing lock. This sequence is + identical to the previous one, except for the + last step. + </p> + + + <pre> + mgr = virLockManagerNew(contentLock, + VIR_LOCK_MANAGER_MODE_CONTENT, + VIR_LOCK_MANAGER_TYPE_DOMAIN); + virLockManagerSetParameter(mgr, "uuid", $uuid); + virLockManagerSetParameter(mgr, "name", $name); + + foreach (current disks) + virLockManagerAddResource(mgr, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + $path, $flags); + + if (virLockManagerAttachObject(mgr, $pid) < 0) + ...abort... + </pre> + + <p> + A lock association will always be explicitly broken + by running + </p> + + <pre> + virLockManagerDetachObject(mgr, $pid); + </pre> + + + <h3><a name="usageLiveResourceChange">Live resource changes</a></h3> + + <p> + When adding a resource to an existing locked object (eg to + hotplug a disk into a VM), the lock manager will first + attach to the locked object, acquire a lock on the + new resource, then detach from the locked object. + </p> + + <pre> + ... initial glue ... + if (virLockManagerAttachObject(mgr, $pid) < 0) + ...abort... + + if (virLockManagerAcquireResource(mgr, + VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, + $path, $flags) < 0) + ...abort... + + ...assign resource to object + + virLockManagerDetachObject(mgr, $pid) + </pre> + + <p> + Removing a resource from an existing object is an identical + process, but with <code>virLockManagerReleaseResource</code> + invoked instead + </p> + + </body> +</html> -- 1.7.3.4 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list