On 2/9/23 10:56, Daniel P. Berrangé wrote: > On Thu, Feb 09, 2023 at 09:52:00AM +0100, Michal Prívozník wrote: >> On 2/9/23 00:13, Laine Stump wrote: >>> I initially had the passt process being started in an identical >>> fashion to the slirp-helper - libvirt was daemonizing the new process >>> and recording its pid in a pidfile. The problem with this is that, >>> since it is daemonized immediately, any startup error in passt happens >>> after the daemonization, and thus isn't seen by libvirt - libvirt >>> believes that the process has started successfully and continues on >>> its merry way. The result was that sometimes a guest would be started, >>> but there would be no passt process for qemu to use for network >>> traffic. >>> >>> Instead, we should be starting passt in the same manner we start >>> dnsmasq - we just exec it as normal (along with a request that passt >>> create the pidfile, which is just another option on the passt >>> commandline) and wait for the child process to exit; passt then has a >>> chance to parse its commandline and complete all the setup prior to >>> daemonizing itself; if it encounters an error and exits with a non-0 >>> code, libvirt will see the code and know about the failure. We can >>> then grab the output from stderr, log that so the "user" has some idea >>> of what went wrong, and then fail the guest startup. >>> >>> Signed-off-by: Laine Stump <laine@xxxxxxxxxx> >>> --- >>> src/qemu/qemu_passt.c | 9 ++++----- >>> 1 file changed, 4 insertions(+), 5 deletions(-) >>> >>> diff --git a/src/qemu/qemu_passt.c b/src/qemu/qemu_passt.c >>> index 0f09bf3db8..f640a69c00 100644 >>> --- a/src/qemu/qemu_passt.c >>> +++ b/src/qemu/qemu_passt.c >>> @@ -141,24 +141,23 @@ qemuPasstStart(virDomainObj *vm, >>> g_autofree char *passtSocketName = qemuPasstCreateSocketPath(vm, net); >>> g_autoptr(virCommand) cmd = NULL; >>> g_autofree char *pidfile = qemuPasstCreatePidFilename(vm, net); >>> + g_autofree char *errbuf = NULL; >>> char macaddr[VIR_MAC_STRING_BUFLEN]; >>> size_t i; >>> pid_t pid = (pid_t) -1; >>> int exitstatus = 0; >>> int cmdret = 0; >>> - VIR_AUTOCLOSE errfd = -1; >>> >>> cmd = virCommandNew(PASST); >>> >>> virCommandClearCaps(cmd); >>> - virCommandSetPidFile(cmd, pidfile); >>> - virCommandSetErrorFD(cmd, &errfd); >>> - virCommandDaemonize(cmd); >>> + virCommandSetErrorBuffer(cmd, &errbuf); >>> >>> virCommandAddArgList(cmd, >>> "--one-off", >>> "--socket", passtSocketName, >>> "--mac-addr", virMacAddrFormat(&net->mac, macaddr), >>> + "--pid", pidfile, >> >> The only problem with this approach is that our virPidFile*() functions >> rely on locking the very first byte. And when reading the pidfile, we >> try to lock the file and if we succeeded it means the file wasn't locked >> which means the process holding the lock died and thus the pid in the >> pidfile is stale. >> >> Now, I don't see passt locking the pidfile at all. So effectively, after >> this patch qemuPasstStop() would do nothing (well, okay, it'll remove >> the pidfile), qemuPasstSetupCgroup() does nothing, etc. >> >> What we usually do in this case, is: we let our code write the pidfile >> (just like the current code does), but then have a loop that waits a bit >> for socket to show up. If it doesn't in say 5 seconds we kill the child >> process (which we know the PID of). You can take inspiration from: >> qemuDBusStart() or qemuProcessStartManagedPRDaemon(). > > Busy waiting for sockets is nasty though. Depending on how passt is > written it might not be needed. If passt creates the listen() > socket and does all the important initialization steps that are liable > to fail, *before* it daemonizes, then we can synchronize without busy > waiting. ie waitpid() for passt leader process to exit. Then check if > the socket exists. If it does, then passt has daemonized and is listening > and running, if it does not, then passt failed. That still requires passt to hold the pidfile open and locked, neither of which is happening with the current code. Michal