Re: [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets

Corey Bryant <coreyb@xxxxxxxxxxxxxxxxxx> · Wed, 25 Jul 2012 23:57:37 -0400






On 07/25/2012 03:43 PM, Eric Blake wrote:
On 07/23/2012 07:08 AM, Corey Bryant wrote:
When qemu_open is passed a filename of the "/dev/fdset/nnn"
format (where nnn is the fdset ID), an fd with matching access
mode flags will be searched for within the specified monitor
fd set.  If the fd is found, a dup of the fd will be returned
from qemu_open.

Each fd set has a reference count.  The purpose of the reference
count is to determine if an fd set contains file descriptors that
have open dup() references that have not yet been closed.  It is
incremented on qemu_open and decremented on qemu_close.  It is
not until the refcount is zero that file desriptors in an fd set
can be closed.  If an fd set has dup() references open, then we
must keep the other fds in the fd set open in case a reopen
of the file occurs that requires an fd with a different access
mode.


+++ b/monitor.c
@@ -2551,6 +2551,91 @@ static void monitor_fdsets_set_in_use(Monitor *mon, bool in_use)
      }
  }

+void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id)
+{
+    mon_fdset_t *mon_fdset;
+
+    if (!mon) {
+        return;
+    }

Am I reading this code right by stating that 'if there is no monitor, we
don't increment the refcount'?  How does a monitor reattach affect
things?  Or am I missing something fundamental about the cases when
'mon==NULL' will exist?


Yes you're reading this correctly.

I'm pretty sure that mon will only be NULL if QEMU is started without a 
monitor.

If QEMU has a monitor, and libvirt disconnects it's connection to the 
qemu monitor, then I believe mon will remain non-NULL.

I'll plan on testing this out to verify though.  (I'm out most of this 
week and will be back full time starting next Tues.)

+int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
+{
+    mon_fdset_t *mon_fdset;
+    mon_fdset_fd_t *mon_fdset_fd;
+    int mon_fd_flags;
+
+    if (!mon) {
+        errno = ENOENT;
+        return -1;
+    }
+
+    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
+        if (mon_fdset->id != fdset_id) {
+            continue;
+        }
+        QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
+            if (mon_fdset_fd->removed) {
+                continue;
+            }
+
+            mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
+            if (mon_fd_flags == -1) {
+                return -1;

This says we fail on the first fcntl() failure, instead of trying other
fds in the set.  Granted, an fcntl() failure is probably the sign of a
bigger bug (such as closing an fd at the wrong point in time), so I
guess trying to go on doesn't make much sense once we already know we
are hosed.


I think I'll stick with it the way it is.  If fcntl() fails we might 
have a tainted fd set so I think we should fail.

+            }
+
+            switch (flags & O_ACCMODE) {
+            case O_RDWR:
+                if ((mon_fd_flags & O_ACCMODE) == O_RDWR) {
+                    return mon_fdset_fd->fd;
+                }
+                break;
+            case O_RDONLY:
+                if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) {
+                    return mon_fdset_fd->fd;
+                }
+                break;

Do we want to allow the case where the caller asked for O_RDONLY, but
the set only has O_RDWR?  After all, the caller is getting a compatible
subset of what the set offers.


I don't see a problem with it.

+            case O_WRONLY:
+                if ((mon_fd_flags & O_ACCMODE) == O_WRONLY) {
+                    return mon_fdset_fd->fd;
+                }
+                break;

Likewise, should we allow a caller asking for O_WRONLY when the set
provides only O_RDWR?


I don't see a problem with it.


+/*
+ * Dups an fd and sets the flags
+ */
+static int qemu_dup(int fd, int flags)
+{
+    int i;
+    int ret;
+    int serrno;
+    int dup_flags;
+    int setfl_flags[] = { O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME,
+                          O_NONBLOCK, 0 };
+
+    if (flags & O_CLOEXEC) {
+        ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);

F_DUPFD_CLOEXEC is required by POSIX but not implemented on all modern
OS yet; you probably need some #ifdef and/or configure guards.


Ok

+        if (ret == -1 && errno == EINVAL) {
+            ret = dup(fd);
+            if (ret != -1 && fcntl_setfl(ret, O_CLOEXEC) == -1) {

You _can't_ call F_SETFL with O_CLOEXEC.  O_CLOEXEC only causes open()
to set FD_CLOEXEC; thereafter, including in the case of this dup, what
you want to do is instead set FD_CLOEXEC via F_SETFD (aka call
qemu_set_cloexec, not fcntl_setfl).


I know, this is a mistake.  I'm planning to replace fcntl_setfl() here 
with qemu_set_cloexec().

+                goto fail;
+            }
+        }
+    } else {
+        ret = dup(fd);
+    }
+
+    if (ret == -1) {
+        goto fail;
+    }
+
+    dup_flags = fcntl(ret, F_GETFL);
+    if (dup_flags == -1) {
+        goto fail;
+    }
+
+    if ((flags & O_SYNC) != (dup_flags & O_SYNC)) {
+        errno = EINVAL;
+        goto fail;
+    }
+
+    /* Set/unset flags that we can with fcntl */
+    i = 0;
+    while (setfl_flags[i] != 0) {
+        if (flags & setfl_flags[i]) {
+            dup_flags = (dup_flags | setfl_flags[i]);
+        } else {
+            dup_flags = (dup_flags & ~setfl_flags[i]);
+        }
+        i++;
+    }

Rather than looping one bit at a time, you should do this as a mask
operation.


I agree.

+
+    if (fcntl(ret, F_SETFL, dup_flags) == -1) {
+        goto fail;
+    }
+
+    /* Truncate the file in the cases that open() would truncate it */
+    if (flags & O_TRUNC ||
+            ((flags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))) {
+        if (ftruncate(ret, 0) == -1) {
+            goto fail;
+        }
+    }
+
+    qemu_set_cloexec(ret);

If we're going to blindly set FD_CLOEXEC at the end of the day, rather
than try to honor O_CLOEXEC, then why not simplify the beginning of this
function:

This call to qemu_set_cloexec() was a mistake.  I'm planning on removing it.


     ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
     if (ret == -1 && errno == EINVAL) {
         ret = dup(fd);
         if (ret != -1) {
             qemu_set_cloexec(ret);
         }
     }
     if (ret == -1) {
         goto fail;
     }


I'll plan on sticking with the existing code in the beginning of this 
function with the modifications mentioned above.

--
Regards,
Corey


--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list