On Thu, Mar 22, 2018 at 11:16:47AM +0100, Michal Privoznik wrote: > On 03/15/2018 01:31 PM, Peter Krempa wrote: > > On Wed, Mar 14, 2018 at 17:05:38 +0100, Michal Privoznik wrote: > >> Before we exec() qemu we have to spawn pr-helper processes for > >> all managed reservations (well, technically there can only one). > >> The only caveat there is that we should place the process into > >> the same namespace and cgroup as qemu (so that it shares the same > >> view of the system). But we can do that only after we've forked. > >> That means calling the setup function between fork() and exec(). > >> > >> Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> > >> --- > >> src/qemu/qemu_process.c | 169 ++++++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 169 insertions(+) > >> > > > > > >> + > >> + priv->prPid = cpid; > >> + ret = 0; > >> + cleanup: > >> + if (ret < 0) { > >> + virCommandAbort(cmd); > >> + virProcessKillPainfully(cpid, true); > >> + } > >> + virCommandFree(cmd); > >> + if (pidfile) { > >> + if (unlink(pidfile) < 0 && > >> + errno != ENOENT && > >> + !virGetLastError()) > >> + virReportSystemError(errno, > >> + _("Cannot remove stale PID file %s"), > >> + pidfile); > >> + VIR_FREE(pidfile); > > > > By removing the pidfile, you can't make sure later that the PR helper > > program is still the same process, thus when killing the VM by the pid > > number only you might actually kill a different process. > > > > I unfortunately already deleted the top of the message so you'll need to > > fix the function qemuProcessKillPRDaemon to see whether it's the same > > process, similarly to what we do when reconnecting to qemu. In general when pidfiles are locked using lockf()/fcntl(), you should never unlink them unless you have first acquired the lock, and even then the code that acquires the lock needs to be made robust against races. a race condition Our virPidFileAcquirePath copes with race'ing unlinks, but QEMU's code does not. We should fix QEMU in this respect too. > > Okay, let me write here what I told you in person. I've tried couple of > approaches to do proper locking of pidfile: > > a) run qemu-pr-helper -f $pidfile in combination with > virPidFileForceCleanupPath(). This doesn't work because qemu-pr-helper > uses lockf() to lock the pidfile. Libvirt uses F_SETLK. These two locks > don't know about each other. Sure, we can adapt virPid* to do > lockf()-ing. But we will fail miserably if qemu ever decides to change > that to say fnctl(). I don't think that is correct - lockf() is just a thing wrapper around fnctl() F_SETLK from what I understand. Many systems do this equivalance, although technically POSIX leaves it undefined. > b) I've changed qemu-pr-helper to use fnctl() to lock pidfile. They have > a nice wrapper over it - qemu_lock_fd(). This still did not fly. Their > wrapper prefers F_OFD_SETLK over F_SETLK. OFD stands for Open File > Description Locks. The lock is not associated with process but with FD > and thus survives fork(). Again, incompatible locks. If file is locked > with F_OFD_SETLK another process can F_SETLK it successfully. QEMU does the OFD_SETLK/SETLK magic because its APIs were primarily design for use with the block layer, where it must have sane semantics to prevent accidental loss of the lock. The limitations of F_SETLK are not really an issue for pidfiles which are opened once and never closed. So it would be valid to change QEMU to use fcntl(F_SETLK) exclusively for pidfiles. This would be semantically compatible with lockf() on most systems, and get QEMU away from under-specified semantics of lockf(). > c) My third attempt was to not rely on qemu-pr-helper locking at all > (after all they can change it anytime which could break our tailored > solution). So I've modified qemuProcessStartPRDaemonHook() so it calls > virPidFileAcquirePath() and leaks the pidfile FD to qemu-pr-helper. This > looked promising: command hooks are called just before exec(), after all > FDs are closed. Unfortunately, virPidFileAcquirePath() sets FD_CLOEXEC > flag, so no leaking is possible. Sure I can call virSetInherit() to > cancel the CLOEXEC flag. BUT, here the proposal: > > IIUC the only problem we have with storing pid and killing it later is > that we might end up killing a different process than qemu-pr-helper > because qemu-pr-helper may die and another process might be spawn with > its PID. Well, this would be solved with the events I'm mentioning in > cover letter. The idea is that qemu sends us an event as soon as > qemu-pr-helper process dies. So we would spawn a new one and update the > stored PID. This way we would never kill wrong process. > > BTW: don't look into qemuProcessReconnect. It suffers from the same > problem. If libvirtd is stopped, qemu quits and another process is > spawned with its PID, we kill the new process as soon as we try to > reconnect to the monitor, because connecting to the monitor fails > (obviously) so we jump onto error label where qemuProcessStop() is > called. Worse, if there's OOM situation and we fail to asprintf() > pidfile path we jump onto the label too (and massacre the process we > think is qemu). > > > My proposal is to: > 1) keep tracking of PID as-is now, > 2) forbid starting domains with <reservations managed='yes'/> until the > events are implemented. > > Alternatively, I can go with c) (in the end I have it working locally) > and do all the changes necessary. Frankly, I don't have preference. I > also wonder if API design for events and query-* command for reconnect > might makes us lean towards one of the options. I would suggest doing both a) and b). Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list