On 03/15/2018 01:31 PM, Peter Krempa wrote: > On Wed, Mar 14, 2018 at 17:05:38 +0100, Michal Privoznik wrote: >> Before we exec() qemu we have to spawn pr-helper processes for >> all managed reservations (well, technically there can only one). >> The only caveat there is that we should place the process into >> the same namespace and cgroup as qemu (so that it shares the same >> view of the system). But we can do that only after we've forked. >> That means calling the setup function between fork() and exec(). >> >> Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> >> --- >> src/qemu/qemu_process.c | 169 ++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 169 insertions(+) >> > >> + >> + priv->prPid = cpid; >> + ret = 0; >> + cleanup: >> + if (ret < 0) { >> + virCommandAbort(cmd); >> + virProcessKillPainfully(cpid, true); >> + } >> + virCommandFree(cmd); >> + if (pidfile) { >> + if (unlink(pidfile) < 0 && >> + errno != ENOENT && >> + !virGetLastError()) >> + virReportSystemError(errno, >> + _("Cannot remove stale PID file %s"), >> + pidfile); >> + VIR_FREE(pidfile); > > By removing the pidfile, you can't make sure later that the PR helper > program is still the same process, thus when killing the VM by the pid > number only you might actually kill a different process. > > I unfortunately already deleted the top of the message so you'll need to > fix the function qemuProcessKillPRDaemon to see whether it's the same > process, similarly to what we do when reconnecting to qemu. Okay, let me write here what I told you in person. I've tried couple of approaches to do proper locking of pidfile: a) run qemu-pr-helper -f $pidfile in combination with virPidFileForceCleanupPath(). This doesn't work because qemu-pr-helper uses lockf() to lock the pidfile. Libvirt uses F_SETLK. These two locks don't know about each other. Sure, we can adapt virPid* to do lockf()-ing. But we will fail miserably if qemu ever decides to change that to say fnctl(). b) I've changed qemu-pr-helper to use fnctl() to lock pidfile. They have a nice wrapper over it - qemu_lock_fd(). This still did not fly. Their wrapper prefers F_OFD_SETLK over F_SETLK. OFD stands for Open File Description Locks. The lock is not associated with process but with FD and thus survives fork(). Again, incompatible locks. If file is locked with F_OFD_SETLK another process can F_SETLK it successfully. c) My third attempt was to not rely on qemu-pr-helper locking at all (after all they can change it anytime which could break our tailored solution). So I've modified qemuProcessStartPRDaemonHook() so it calls virPidFileAcquirePath() and leaks the pidfile FD to qemu-pr-helper. This looked promising: command hooks are called just before exec(), after all FDs are closed. Unfortunately, virPidFileAcquirePath() sets FD_CLOEXEC flag, so no leaking is possible. Sure I can call virSetInherit() to cancel the CLOEXEC flag. BUT, here the proposal: IIUC the only problem we have with storing pid and killing it later is that we might end up killing a different process than qemu-pr-helper because qemu-pr-helper may die and another process might be spawn with its PID. Well, this would be solved with the events I'm mentioning in cover letter. The idea is that qemu sends us an event as soon as qemu-pr-helper process dies. So we would spawn a new one and update the stored PID. This way we would never kill wrong process. BTW: don't look into qemuProcessReconnect. It suffers from the same problem. If libvirtd is stopped, qemu quits and another process is spawned with its PID, we kill the new process as soon as we try to reconnect to the monitor, because connecting to the monitor fails (obviously) so we jump onto error label where qemuProcessStop() is called. Worse, if there's OOM situation and we fail to asprintf() pidfile path we jump onto the label too (and massacre the process we think is qemu). My proposal is to: 1) keep tracking of PID as-is now, 2) forbid starting domains with <reservations managed='yes'/> until the events are implemented. Alternatively, I can go with c) (in the end I have it working locally) and do all the changes necessary. Frankly, I don't have preference. I also wonder if API design for events and query-* command for reconnect might makes us lean towards one of the options. Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list