Re: [RFC] Add basic driver for the Cloud-Hypervisor

"Douglas, William" <william.douglas@xxxxxxxxx> · Tue, 22 Sep 2020 12:44:01 -0700

On Tue, Sep 22, 2020 at 12:07 PM Ján Tomko <jtomko@xxxxxxxxxx> wrote:
>
> On a Tuesday in 2020, Douglas, William wrote:
> >On Tue, Sep 22, 2020 at 1:11 AM Daniel P. Berrangé <berrange@xxxxxxxxxx> wrote:
> >>
> >> On Mon, Sep 21, 2020 at 12:05:48PM -0700, Douglas, William wrote:
> >> > On Tue, Sep 15, 2020 at 5:53 AM Daniel P. Berrangé <berrange@xxxxxxxxxx> wrote:
> >> > >
> >> > > On Thu, Aug 27, 2020 at 11:24:32AM -0700, William Douglas wrote:
> >> >
> >> > <snip>
> >> >
> >> > > > +virCHMonitorPtr
> >> > > > +virCHMonitorNew(virDomainObjPtr vm, const char *socketdir)
> >> > > > +{
> >> > > > +    virCHMonitorPtr ret = NULL;
> >> > > > +    virCHMonitorPtr mon = NULL;
> >> > > > +    virCommandPtr cmd = NULL;
> >> > > > +    int pings = 0;
> >> > > > +
> >> > > > +    if (virCHMonitorInitialize() < 0)
> >> > > > +        return NULL;
> >> > > > +
> >> > > > +    if (!(mon = virObjectLockableNew(virCHMonitorClass)))
> >> > > > +        return NULL;
> >> > > > +
> >> > > > +    mon->socketpath = g_strdup_printf("%s/%s-socket", socketdir, vm->def->name);
> >> > > > +
> >> > > > +    /* prepare to launch Cloud-Hypervisor socket */
> >> > > > +    if (!(cmd = chMonitorBuildSocketCmd(vm, mon->socketpath)))
> >> > > > +        goto cleanup;
> >> > > > +
> >> > > > +    if (virFileMakePath(socketdir) < 0) {
> >> > > > +        virReportSystemError(errno,
> >> > > > +                             _("Cannot create socket directory '%s'"),
> >> > > > +                             socketdir);
> >> > > > +        goto cleanup;
> >> > > > +    }
> >> > > > +
> >> > > > +    /* launch Cloud-Hypervisor socket */
> >> > > > +    if (virCommandRunAsync(cmd, &mon->pid) < 0)
> >> > > > +        goto cleanup;
> >> > > > +
> >> > > > +    /* get a curl handle */
> >> > > > +    mon->handle = curl_easy_init();
> >> > > > +
> >> > > > +    /* try to ping VMM socket 5 times to make sure it is ready */
> >> > > > +    while (pings < 5) {
> >> > > > +        if (virCHMonitorPingVMM(mon) == 0)
> >> > > > +            break;
> >> > > > +        if (pings == 5)
> >> > > > +            goto cleanup;
> >> > > > +
> >> > > > +        g_usleep(100 * 1000);
> >> > > > +    }
> >> > >
> >> > > This is highly undesirable. Is there no way to launch the CH process
> >> > > such that the socket is guaranteed to be accepting requests by the
> >> > > time it has forked into the background ? Or is there a way to pass
> >> > > in a pre-opened FD for the monitor socket ?
> >> > >
> >> > > This kind of wait with timeout will cause startup failures due to
> >> > > timeout under load.
> >> >
> >> > Currently the cloud-hypervisor process doesn't fork into the
> >> > background so that was initially what I was thinking to add to
> >> > cloud-hypervsior. Would tracking the pid of the cloud-hypervsior
> >> > running in the background be required at that point (assuming the
> >> > socket path to control the daemon would be known and working given
> >> > virCommandRun returns successfully)?
> >>
> >> It doesn't especially matter whether cloud-hypervsior forks into
> >> the background itself, or whether libvirt forks it into the
> >> background when launching it.
> >>
> >> The important thing is simply to ensure that whichever startup
> >> approach is used can be implemented in a race-free manner without
> >> needing busy-waits or timeouts.
> >>
> >> If cloud-hypervisor forks into the background, then it would
> >> need to write a pidfile so libvirt can determine the pid that
> >> ended up running. The act of libvirt waiting for the pidfile
> >> to be written is potentially racy though, so that might not be
> >> be best.
> >>
> >> Based on what we learnt from QEMU, I think being able to pass
> >> in a pre-created UNIX listener socket is the best. That lets
> >> libvirt immedately issue a connect() start after forking the
> >> cloud-hypervisor process. If cloud-hypervisor succesfully
> >> starts up, then the client socket becomes live and can be used.
> >> if it fails to startup, then the client socket libvirt has
> >> will get EOF and thus we'll see the startup failure. This
> >> avoids any looping/sleeping/timeout.
> >
> >I think I'm misreading/missing something from the qemu setup but
> >looking at qemu_monitor.c's qemuMonitorOpenUnix function I see a retry
> >connect loop based on a timeout when trying to connect to a socket
>
> That loop is guarded by: if (retry)
>
> The important code path:
> qemuProcessLaunch
> `_ qemuProcessWaitForMonitor
>    `_ qemuConnectMonitor
>
> qemuProcessWaitForMonitor sets retry=false if we're dealing with
> a recent enough QEMU that supports FD passing for its chardevs
>
> The logic that creates the socket on libvirt's side lives (sadly)
> in qemuBuildChrChardevStr.
>

Ah I was missing the possibility of retry being set to false (I saw
false would be set on a reconnect I think but didn't look much
further) and the entirety of the qemuBuildChrChardevStr being the key
part of the setup. Thank you very much!