On 07/07/2011 08:17 AM, Daniel P. Berrange wrote: > From: "Daniel P. Berrange" <berrange@xxxxxxxxxx> > > If libvirtd crashes then the pidfile may not be cleaned up, > making a later restart fail, even though the original process > no longer exists. > > Instead of simply using file creation as the test for successful > pidfile acquisition, actually take out a lock on the pidfile. > > To avoid races, after locking the pidfile, verify that the > inode hasn't changed. > > Also make sure the unprivileged libvirtd now acquires the > pidfile, instead of relying on the UNIX socket to ensure > mutual exclusion Cool idea. > -static int daemonWritePidFile(const char *pidFile, const char *argv0) > -{ > +static int > +daemonAcquirePidFile(const char *argv0, const char *pidFile) { Why'd you hoist the { to the previous line? Our convention has been (slowly) leaning towards a function body starting on column 1. > int fd; > - FILE *fh; > char ebuf[1024]; > + unsigned long long pid = (unsigned long long)getpid(); See my comments in an earlier thread about our existing assumptions that pid_t fits in int. In fact, I would be okay with: verify(sizeof(pid_t) <= sizeof(int)); int pid = getpid(); char pidstr[INT_BUFSIZE_BOUND(pid)]; ... snprintf(pidstr, sizeof(pidstr), "%u", pid); > + char pidstr[INT_BUFSIZE_BOUND(pid)]; > > if (pidFile[0] == '\0') > return 0; > > - if ((fd = open(pidFile, O_WRONLY|O_CREAT|O_EXCL, 0644)) < 0) { > - VIR_ERROR(_("Failed to open pid file '%s' : %s"), > - pidFile, virStrerror(errno, ebuf, sizeof ebuf)); > - return -1; > - } > + while (1) { > + struct stat a, b; > + if ((fd = open(pidFile, O_WRONLY|O_CREAT, 0644)) < 0) { > + VIR_ERROR(_("%s: Failed to open pid file '%s' : %s"), > + argv0, pidFile, virStrerror(errno, ebuf, sizeof ebuf)); > + return -1; > + } > + > + if (fstat(fd, &b) < 0) { > + VIR_ERROR(_("%s: Pid file '%s' disappeared: %s"), Misleading error message. fstat can indeed fail (although such failures are rare), but not because a file disappeared - after all, you have an fd open to the file. > - if (fprintf(fh, "%lu\n", (unsigned long)getpid()) < 0) { > - VIR_ERROR(_("%s: Failed to write to pid file '%s' : %s"), > - argv0, pidFile, virStrerror(errno, ebuf, sizeof ebuf)); > - VIR_FORCE_FCLOSE(fh); > - return -1; > - } > + snprintf(pidstr, sizeof(pidstr), "%llu", pid); > > - if (VIR_FCLOSE(fh) == EOF) { > - VIR_ERROR(_("%s: Failed to close pid file '%s' : %s"), > - argv0, pidFile, virStrerror(errno, ebuf, sizeof ebuf)); > - return -1; > + if (safewrite(fd, pidstr, strlen(pidstr)) < 0) { Nice conversion from FILE* to write(). > @@ -1436,6 +1453,12 @@ int main(int argc, char **argv) { > umask(old_umask); > } > > + /* Try to claim the pidfile, existing if we can't */ s/existing/exiting/ > + if ((pid_file_fd = daemonAcquirePidFile(argv[0], pid_file)) < 0) { > + ret = VIR_DAEMON_ERR_PIDFILE; > + goto cleanup; > + } > + > if (!(srv = virNetServerNew(config->min_workers, > config->max_workers, > config->max_clients, > @@ -1570,8 +1593,10 @@ cleanup: > } > VIR_FORCE_CLOSE(statuswrite); > } > - if (pid_file) > - unlink (pid_file); > + if (pid_file_fd != -1) { > + unlink(pid_file); > + VIR_FORCE_CLOSE(pid_file_fd); Swap these two lines. Not that flock (or even libvirtd) works on mingw, but mingw doesn't like unlink() on an open fd. -- Eric Blake eblake@xxxxxxxxxx +1-801-349-2682 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list