On 05/01/2012 03:56 PM, Eric Blake wrote:
On 05/01/2012 02:25 PM, Anthony Liguori wrote:
Thanks for sending this out Stefan.
Indeed.
This series adds the -open-hook-fd command-line option. Whenever QEMU
needs to
open an image file it sends a request over the given UNIX domain
socket. The
response includes the file descriptor or an errno on failure. Please
see the
patches for details on the protocol.
The -open-hook-fd approach allows QEMU to support file descriptor passing
without changing -drive. It also supports snapshot_blkdev and other
commands
that re-open image files.
Anthony Liguori<aliguori@xxxxxxxxxx> wrote most of these patches. I
added a
demo -open-hook-fd server and added some small fixes. Since Anthony is
traveling right now I'm sending the RFC for discussion.
What I like about this approach is that it's useful outside the block
layer and is conceptionally simple from a QEMU PoV. We simply delegate
open() to libvirt and let libvirt enforce whatever rules it wants.
This is not meant to be an alternative to blockdev, but even with
blockdev, I think we still want to use a mechanism like this even with
blockdev.
The overall series looks like it would be rather interesting. What sort
of timing restrictions are there? For example, the proposed
'drive-reopen' command (probably now delegated to qemu 1.2) would mean
that qemu would be calling back into libvirt in order to do the reopen.
If libvirt takes its time in passing back an open fd, is it going to
starve qemu from answering unrelated monitor commands in the meantime?
s/libvirt/kernel/g and your concerns are equally valid.
Doing open() should never be done in a path that could block things. There's
always the possibility that we're on top of NFS and the open could timeout.
For something like drive_reopen, we should use an asynchronous open() that
dispatched the open() in the posix-aio thread pool.
That's part of what's nice about this approach, we could still call file_open()
in the posix-aio thread pool...
I definitely want to make sure we avoid deadlock where libvirt is
waiting on a monitor command, but the monitor command is waiting on
libvirt to pass an fd.
Is this also an opportunity to request whether a particular fd must be
seekable vs. acceptable as a one-pass read or write, perhaps by whether
the command is 1 (seekable open) or 2 (one-pass open)?
I'm not really sure where the distinction lies...
I want the RPC to behave exactly like open(). So if we're assuming that open()
of a /dev/ file returns something that is ioctl()'able, then that's what libvirt
should return.
If we want to sort of do fd-transformation where a special protocol is used for
things like ioctl, that's fine, but it ought to be a different mechanism (that's
probably not nearly as generic).
For example,
migration is one-pass (and therefore libvirt passes a pipe which is
hooked up to a helper app that uses O_DIRECT), while block devices must
be seekable.
But migration doesn't involve doing an open(). This is not a replacement for fd
passing. This is a replacement for open() to make up for the facts that (1)
some management tools like libvirt cannot isolate guests with DAC and (2)
SELinux cannot be used to isolate guests across all file systems.
I would really prefer that the kernel fix this problem for us, but from what I'm
told, the problem lies in the NFS standards committee so short of forking the
NFS protocol, there isn't much that the kernel can do.
Regards,
Anthony Liguori
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list