On Wed, Sep 10, 2014 at 07:27:40PM -0400, John Ferlan wrote:
On 09/08/2014 01:46 AM, Martin Kletzander wrote:This way it behaves more like the daemon itself does (acquiring a pidfile, deleting the socket before binding, etc.). Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=927369 Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1138604 Signed-off-by: Martin Kletzander <mkletzan@xxxxxxxxxx> --- src/rpc/virnetsocket.c | 57 ++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 51 insertions(+), 6 deletions(-) diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c index 42184bd..18a5a8f 100644 --- a/src/rpc/virnetsocket.c +++ b/src/rpc/virnetsocket.c @@ -51,9 +51,11 @@ #include "virlog.h" #include "virfile.h" #include "virthread.h" +#include "virpidfile.h" #include "virprobe.h" #include "virprocess.h" #include "virstring.h" +#include "dirname.h" #include "passfd.h" #if WITH_SSH2 @@ -544,7 +546,10 @@ int virNetSocketNewConnectUNIX(const char *path, const char *binary, virNetSocketPtr *retsock) { + char *binname = NULL; + char *pidpath = NULL; int fd, passfd = -1; + int pidfd = -1; virSocketAddr localAddr; virSocketAddr remoteAddr; @@ -583,16 +588,45 @@ int virNetSocketNewConnectUNIX(const char *path, goto error; } + if (!(binname = last_component(binary)) || binname[0] == '\0') { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Cannot determine basename for binary '%s'"), + binary); + goto error; + } + + if (virPidFileConstructPath(false, NULL, binname, &pidpath) < 0) + goto error; + + if ((pidfd = virPidFileAcquirePath(pidpath, false, getpid())) < 0) { + /* + * This can happen in a very rare case of two clients spawning two + * daemons at the same time, and the error in the logs that gets + * reset here can be a clue to some future debugging. + */ + virResetLastError(); + spawnDaemon = false;Since this is the failure path of connect(), the pidpath will be lost if the subsequent connect() succeeds.
I added VIR_FREE(pidpath) between virPidFileAcquirePath() and the condition.
+ goto retry; + } + if ((passfd = socket(PF_UNIX, SOCK_STREAM, 0)) < 0) { virReportSystemError(errno, "%s", _("Failed to create socket")); goto error; } /* - * We have to fork() here, because umask() is set - * per-process, chmod() is racy and fchmod() has undefined - * behaviour on sockets according to POSIX, so it doesn't - * work outside Linux. + * We already even acquired the pidfile, so noone else should be usings/noone/no one/
OK;
+ * @path right now. So we're OK to unlink it and paying attention to + * the return value makes no real sense here. Only if it's not an + * abstract socket, of course. + */ + if (path[0] != '@') + unlink(path); + + /* + * We have to fork() here, because umask() is set per-process, chmod() + * is racy and fchmod() has undefined behaviour on sockets according to + * POSIX, so it doesn't work outside Linux. */ if ((pid = virFork()) < 0) goto error; @@ -612,8 +646,9 @@ int virNetSocketNewConnectUNIX(const char *path, /* * OK, so the subprocces failed to bind() the socket. This may meanNot related but, s/subprocces/subprocess [or I guess technically child]
I fixed it to "child" when fixing the whole file, thanks for noticing that.
* that another daemon was starting at the same time and succeeded - * with its bind(). So we'll try connecting again, but this time - * without spawning the daemon. + * with its bind() (even though it should not happen because we + * using a pidfile for the race). So we'll try connecting again, + * but this time without spawning the daemon. */ spawnDaemon = false; goto retry;Like in the previous goto, since this is the failure path of a connect() and if the subsequent one succeeds, then 'pidfd', 'pidpath', 'passfd' are all leaked.
Closing these in front of return 0; looked weird, so I closed them before the goto;
@@ -632,6 +667,12 @@ int virNetSocketNewConnectUNIX(const char *path, goto error; } + /* + * Do we need to eliminate the super-rare race here any more? It would + * need incorporating the following VIR_FORCE_CLOSE() into a + * virCommandHook inside a virNetSocketForkDaemon(). + */ + VIR_FORCE_CLOSE(pidfd);Well - you caused me to look... it's interesting to note that "virNetSocketForkDaemon()" is bounded by "#ifndef WIN32", but it's a bit different here. Yes, I know it says *UNIX in the prototype, but yet virNetSocketNewConnectUNIX is bounded by "#ifdef HAVE_SYS_UN_H"...
I haven't heard from anyone having problem with this, even though it doesn't look very good, you're right. On the other hand what's the probability of someone out there compiling libvirt on non-win32 without UNIX sockets? :)
Of course any time there's a "timing window" available - something will eventually find it (whether or not there a full-moon, on the 13th of the month, when tides are extraordinarily high, or whatever other strange thing happens when software goes bad).
Yes, it will. Although there was a window between and this only affects starting virsh in session mode without any daemon, so it's not that serious, I guess.
Would another option to tighten the window be to pass the 'pidfd' (by reference to ensure a VIR_FREE(*pidfd); does what you expect) to the call and do the close "later" or closer to the condition causing the race? Optionally adjust the code to return 'cmd' and run from here assuming that's where the race is.
I could pass the fd into virNetSocketForkDaemon(), close it before starting daemon or just close it after fork (that would keep it opened in the process that's going to exec() libvirtd and thanks to O_CLOEXEC it would be closed after daemon is started. The window would still be there though, just few milliseconds smaller. Or we (and I *really* don't mean myself) could pass the fd to the pidfile into the daemon which would eliminate the problem completely.
Perhaps eblake will have an opinion and thoughts on the 'super-rare' condition...if (virNetSocketForkDaemon(binary, passfd) < 0) goto error; } @@ -648,8 +689,12 @@ int virNetSocketNewConnectUNIX(const char *path, return 0; error: + if (pidfd > 0) + virPidFileDeletePath(pidpath);Is '0' a valid/possible 'pidfd'? Other places use pidfd != -1 and virPidFileReleasePath() which does the CLOSE as well and changes the order to avoid a different race.
Well, it's a valid FD, although it probably corresponds to console i/o or pty or what is it. It is, nevertheless, a valid file descriptor. That's one of the reasons I initialized it with -1; Sending a v2 with your comments addressed even though Cole ACK'd it (as it fixes the issue it shoul've fixed). Thanks for the review, Martin
Attachment:
signature.asc
Description: Digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list