On Thu, Aug 27, 2020 at 11:24:32AM -0700, William Douglas wrote: > This patch adds support for the following initial VM actions using the > Cloud-Hypervsior API: > * vm.create > * vm.delete > * vm.boot > * vm.shutdown > * vm.reboot > * vm.pause > * vm.resume > > To use the Cloud-Hypervisor driver, the v0.9.0 (the as of now current) > release of Cloud-Hypervisor is required to be installed. It would be useful to outline the general architecture / approach to mgmt. IIUC, cloud-hypervisor is similar to QEMU in that launching a VM just involves spawning a process for that VM. There is no central daemon or authority over all cloud-hypervisor VMs, thus libvirtd fills that role by acting as the mgmt daemon. In terms of communicating with the process, I'm seeing that basically everyting is REST API calls with JSON encoded data. > Signed-off-by: William Douglas <william.douglas@xxxxxxxxx> > --- > include/libvirt/virterror.h | 1 + > libvirt.spec.in | 32 ++ > meson.build | 5 + > meson_options.txt | 1 + It would be good to have a drvch.html.in file, well a drvch.rst file actually to give users an intro to the new driver. It should probably mention it is an early PoC not really targetting production usage at this time since there's many key features missing. > diff --git a/libvirt.spec.in b/libvirt.spec.in > index bb74443484..66edb1fa76 100644 > --- a/libvirt.spec.in > +++ b/libvirt.spec.in > @@ -14,6 +14,7 @@ > > # The hypervisor drivers that run in libvirtd > %define with_qemu 0%{!?_without_qemu:1} > +%define with_ch 0%{!?_without_ch:1} > %define with_lxc 0%{!?_without_lxc:1} > %define with_libxl 0%{!?_without_libxl:1} > %define with_vbox 0%{!?_without_vbox:1} > @@ -232,6 +233,9 @@ Requires: libvirt-daemon-driver-lxc = %{version}-%{release} > %if %{with_qemu} > Requires: libvirt-daemon-driver-qemu = %{version}-%{release} > %endif > +%if %{with_ch} > +Requires: libvirt-daemon-driver-ch = %{version}-%{release} > +%endif > # We had UML driver, but we've removed it. > Obsoletes: libvirt-daemon-driver-uml <= 5.0.0 > Obsoletes: libvirt-daemon-uml <= 5.0.0 > @@ -744,6 +748,20 @@ QEMU > %endif > > > +%if %{with_ch} > +%package daemon-driver-ch > +Summary: Cloud-Hypervisor driver plugin for the libvirtd daemon > +Requires: libvirt-daemon = %{version}-%{release} > +Requires: libvirt-libs = %{version}-%{release} > +Requires: /usr/bin/cloud-hypervisor > + > +%description daemon-driver-ch > +The Cloud-Hypervisor driver plugin for the libvirtd daemon, > +providing an implementation of the hypervisor driver APIs > +using Cloud-Hypervisor > +%endif The spec file targets Fedora and RHEL, and IIUC, cloud-hypervisor is not present in either of them. So for now the spec file should always disable building of the new driver. > diff --git a/meson.build b/meson.build > index dabd4196e6..a6759cb051 100644 > --- a/meson.build > +++ b/meson.build > @@ -1722,6 +1722,10 @@ elif get_option('driver_lxc').enabled() > error('linux and remote_driver are required for LXC') > endif > > +if not get_option('driver_ch').disabled() and host_machine.system() == 'linux' and conf.has('WITH_LIBVIRTD') > + conf.set('WITH_CH', 1) > +endif The driver relies on Curl and YAJL, so you'll need to do more there to automatically disable the driver if either of those is not present. Take a look a the handling of 'drive_qemu' for a reasonable illustration for YAJL, which you can adapt and include curl too. > diff --git a/src/ch/ch_conf.c b/src/ch/ch_conf.c > new file mode 100644 > index 0000000000..8769b0f7e2 > --- /dev/null > +++ b/src/ch/ch_conf.c > @@ -0,0 +1,239 @@ > +/* > + * Copyright Intel Corp. 2020 > + * > + * ch_driver.h: header file for Cloud-Hypervisor driver functions I'm surprised 'ninja test' didn't complain that the filename mentioned here doesn't match the actual filename. Likewise for other files. > +/* Functions */ > +virCapsPtr virCHDriverCapsInit(void) > +{ > + virCapsPtr caps; > + virCapsGuestPtr guest; > + > + if ((caps = virCapabilitiesNew(virArchFromHost(), > + false, false)) == NULL) > + goto cleanup; > + > + if (!(caps->host.numa = virCapabilitiesHostNUMANewHost())) > + goto cleanup; > + > + if (virCapabilitiesInitCaches(caps) < 0) > + goto cleanup; > + > + if ((guest = virCapabilitiesAddGuest(caps, > + VIR_DOMAIN_OSTYPE_HVM, > + caps->host.arch, > + NULL, > + NULL, > + 0, > + NULL)) == NULL) > + goto cleanup; > + > + if (virCapabilitiesAddGuestDomain(guest, > + VIR_DOMAIN_VIRT_CH, IIUC, cloud hypervisor is still using KVM as the hypervisor, merely providing a different userspace emulation layer. So in terms of the libvirt virt type, we should still be using VIR_DOMAIN_VIRT_KVM. No need to invent a VIR_DOMAIN_VIRT_CH. > + NULL, > + NULL, > + 0, > + NULL) == NULL) > + goto cleanup; This should also be testing of the required binary exists on the host, and if it does not, then don't claim to support guest domains. On 64-bit hosts does CH provide a way to strictly emulate just a 32-bit CPU, or will it always be 64-bit only ? If the former, then you need to add the 32-bit equivalent guest too. > + > + return caps; > + > + cleanup: > + virObjectUnref(caps); > + return NULL; > +} > + > +static void > +virCHDriverConfigDispose(void *obj) > +{ > + virCHDriverConfigPtr cfg = obj; > + > + VIR_FREE(cfg->stateDir); > + VIR_FREE(cfg->logDir); Use g_free for these. > +} > + > +static int > +chExtractVersionInfo(int *retversion) > +{ > + int ret = -1; > + unsigned long version; > + char *help = NULL; > + char *tmp; > + virCommandPtr cmd = virCommandNewArgList(CH_CMD, "--version", NULL); > + > + if (retversion) > + *retversion = 0; > + > + virCommandAddEnvString(cmd, "LC_ALL=C"); > + virCommandSetOutputBuffer(cmd, &help); > + > + if (virCommandRun(cmd, NULL) < 0) > + goto cleanup; > + > + tmp = help; > + > + /* expected format: cloud-hypervisor v<major>.<minor>.<micro> */ > + if ((tmp = STRSKIP(tmp, "cloud-hypervisor v")) == NULL) > + goto cleanup; > + > + if (virParseVersionString(tmp, &version, false) < 0) > + goto cleanup; > + > + // v0.9.0 is the minimum supported version > + if ((unsigned int)(version / 1000000) < 1) { > + if (((unsigned int)((unsigned long)(version % 1000000)) / 1000) < 9) { > + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("Cloud-Hypervisor version is too old (v0.9.0 is the minimum supported version)")); > + goto cleanup; > + } > + } This is a bit more convulted than needed - just encode the min version you want as an integer and do a single compare eg #define MIN_VERSION ((0 * 1000000) + (9 * 1000) + (0)) if (version < MIN_VERSION) { ..report error... } > +int chStrToInt(const char *str) > +{ > + int val; > + > + if (virStrToLong_i(str, NULL, 10, &val) < 0) > + return 0; > + > + return val; > +} This method is pretty pointless and its also not used anywhere > diff --git a/src/ch/ch_conf.h b/src/ch/ch_conf.h > new file mode 100644 > index 0000000000..04334130f7 > --- /dev/null > +++ b/src/ch/ch_conf.h > +#pragma once > + > +#include "virdomainobjlist.h" > +#include "virthread.h" > + > +#define CH_DRIVER_NAME "CH" > +#define CH_CMD "cloud-hypervisor" > + > +#define CH_STATE_DIR RUNSTATEDIR "/libvirt/ch" > +#define CH_LOG_DIR LOCALSTATEDIR "/log/libvirt/ch" > + > +typedef struct _virCHDriver virCHDriver; > +typedef virCHDriver *virCHDriverPtr; > + > +typedef struct _virCHDriverConfig virCHDriverConfig; > +typedef virCHDriverConfig *virCHDriverConfigPtr; > + > +struct _virCHDriverConfig { > + virObject parent; FWIW, I'd encourage you to use GObject rather than virObject for anything new, as our long term goal is to convert all code to use GObject alone and eliminate virObject > diff --git a/src/ch/ch_domain.c b/src/ch/ch_domain.c > new file mode 100644 > index 0000000000..a46641d50d > --- /dev/null > +++ b/src/ch/ch_domain.c > @@ -0,0 +1,219 @@ > +static void * > +virCHDomainObjPrivateAlloc(void *opaque G_GNUC_UNUSED) > +{ > + virCHDomainObjPrivatePtr priv; > + > + if (VIR_ALLOC(priv) < 0) > + return NULL; > + > + if (virCHDomainObjInitJob(priv) < 0) { > + VIR_FREE(priv); > + return NULL; > + } > + > + return priv; > +} > + > +static void > +virCHDomainObjPrivateFree(void *data) > +{ > + virCHDomainObjPrivatePtr priv = data; > + > + virCHDomainObjFreeJob(priv); > + VIR_FREE(priv); > +} > + > +static int > +virCHDomainObjPrivateXMLFormat(virBufferPtr buf, > + virDomainObjPtr vm) > +{ > + virCHDomainObjPrivatePtr priv = vm->privateData; > + virBufferAsprintf(buf, "<init pid='%lld'/>\n", > + (long long)priv->initpid); > + > + return 0; > +} > + > +static int > +virCHDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, > + virDomainObjPtr vm, > + virDomainDefParserConfigPtr config G_GNUC_UNUSED) > +{ > + virCHDomainObjPrivatePtr priv = vm->privateData; > + long long thepid; > + > + if (virXPathLongLong("string(./init[1]/@pid)", ctxt, &thepid) < 0) { > + VIR_WARN("Failed to load init pid from state %s", > + virGetLastErrorMessage()); > + priv->initpid = 0; > + } else { > + priv->initpid = thepid; > + } > + > + return 0; > +} What is this initpid needed for ? The virDomainObjPtr struct already lets us record a PID for the guest in the <domstatus> element. > diff --git a/src/ch/ch_driver.c b/src/ch/ch_driver.c > new file mode 100644 > index 0000000000..e5b027f71f > --- /dev/null > +++ b/src/ch/ch_driver.c > @@ -0,0 +1,937 @@ > +/* > + * Copyright Intel Corp. 2020 > + * > + * ch_driver.h: header file for Cloud-Hypervisor driver functions > + * > + * This library is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * This library is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with this library. If not, see > + * <http://www.gnu.org/licenses/>. > + */ > + > +#include <config.h> > + > +#include "ch_conf.h" > +#include "ch_domain.h" > +#include "ch_driver.h" > +#include "ch_monitor.h" > +#include "ch_process.h" > +#include "datatypes.h" > +#include "driver.h" > +#include "viraccessapicheck.h" > +#include "viralloc.h" > +#include "virbuffer.h" > +#include "vircommand.h" > +#include "virerror.h" > +#include "virfile.h" > +#include "virlog.h" > +#include "virnetdevtap.h" > +#include "virobject.h" > +#include "virstring.h" > +#include "virtypedparam.h" > +#include "viruri.h" > +#include "virutil.h" > +#include "viruuid.h" > + > +#define VIR_FROM_THIS VIR_FROM_CH > + > +VIR_LOG_INIT("ch.ch_driver"); > + > +/* Functions */ > +static int > +chConnectURIProbe(char **uri) > +{ > + if (ch_driver == NULL) > + return 0; > + > + *uri = g_strdup("ch:///system"); > + return 1; > +} Are you not able to support the ch:///session mode for running under the user's unprivileged session ? > +/** > + * chDomainCreateXML: > + * @conn: pointer to connection > + * @xml: XML definition of domain > + * @flags: bitwise-OR of supported virDomainCreateFlags > + * > + * Creates a domain based on xml and starts it > + * > + * Returns a new domain object or NULL in case of failure. > + */ > +static virDomainPtr > +chDomainCreateXML(virConnectPtr conn, > + const char *xml, > + unsigned int flags) > +{ > + virCHDriverPtr driver = conn->privateData; > + virDomainDefPtr vmdef = NULL; > + virDomainObjPtr vm = NULL; > + virDomainPtr dom = NULL; > + unsigned int parse_flags = VIR_DOMAIN_DEF_PARSE_INACTIVE; > + > + virCheckFlags(VIR_DOMAIN_START_VALIDATE, NULL); > + > + if (flags & VIR_DOMAIN_START_VALIDATE) > + parse_flags |= VIR_DOMAIN_DEF_PARSE_VALIDATE_SCHEMA; > + > + > + if ((vmdef = virDomainDefParseString(xml, driver->xmlopt, > + NULL, parse_flags)) == NULL) > + goto cleanup; > + > + if (virDomainCreateXMLEnsureACL(conn, vmdef) < 0) > + goto cleanup; > + > + if (!(vm = virDomainObjListAdd(driver->domains, > + vmdef, > + driver->xmlopt, > + VIR_DOMAIN_OBJ_LIST_ADD_LIVE | > + VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE, > + NULL))) > + goto cleanup; > + > + vmdef = NULL; > + vm->persistent = 1; This isn't right - the virDomainCreateXML method is for creating a transient guest only. Also from this point onwards, for any failure you need to remove the dangling guest from the driver->domains list. > + > + if (virCHDomainObjBeginJob(vm, CH_JOB_MODIFY) < 0) > + goto cleanup; > + > + if (virCHProcessStart(driver, vm, VIR_DOMAIN_RUNNING_BOOTED) < 0) > + goto cleanup; > + > + dom = virGetDomain(conn, vm->def->name, vm->def->uuid, vm->def->id); > + > + virCHDomainObjEndJob(vm); > + > + cleanup: > + virDomainDefFree(vmdef); > + virDomainObjEndAPI(&vm); > + chDriverUnlock(driver); > + return dom; > +} > +static int > +chDomainShutdownFlags(virDomainPtr dom, > + unsigned int flags) > +{ > + virCHDomainObjPrivatePtr priv; > + virDomainObjPtr vm; > + virDomainState state; > + int ret = -1; > + > + virCheckFlags(VIR_DOMAIN_SHUTDOWN_INITCTL | > + VIR_DOMAIN_SHUTDOWN_SIGNAL, -1); These two shutdown methods would really only be for container based virt. When you have a KVM guest, only an APIC based trigger, or guest agent trigger make conceptual sense. > + > + if (!(vm = chDomObjFromDomain(dom))) > + goto cleanup; > + > + priv = vm->privateData; > + > + if (virDomainShutdownFlagsEnsureACL(dom->conn, vm->def, flags) < 0) > + goto cleanup; > + > + if (virCHDomainObjBeginJob(vm, CH_JOB_MODIFY) < 0) > + goto cleanup; > + > + if (virDomainObjCheckActive(vm) < 0) > + goto endjob; > + > + state = virDomainObjGetState(vm, NULL); > + if (state != VIR_DOMAIN_RUNNING && state != VIR_DOMAIN_PAUSED) { > + virReportError(VIR_ERR_OPERATION_INVALID, "%s", > + _("only can shutdown running/paused domain")); > + goto endjob; > + } else { > + if (virCHMonitorShutdownVM(priv->monitor) < 0) { > + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("failed to shutdown guest VM")); > + goto endjob; > + } > + } > + > + virDomainObjSetState(vm, VIR_DOMAIN_SHUTDOWN, VIR_DOMAIN_SHUTDOWN_USER); > + > + ret = 0; > + > + endjob: > + virCHDomainObjEndJob(vm); > + > + cleanup: > + virDomainObjEndAPI(&vm); > + return ret; > +} > +static int > +chDomainReboot(virDomainPtr dom, unsigned int flags) > +{ > + virCHDomainObjPrivatePtr priv; > + virDomainObjPtr vm; > + virDomainState state; > + int ret = -1; > + > + virCheckFlags(VIR_DOMAIN_REBOOT_INITCTL | > + VIR_DOMAIN_REBOOT_SIGNAL, -1); Same note as above. > + > + if (!(vm = chDomObjFromDomain(dom))) > + goto cleanup; > + > + priv = vm->privateData; > + > + if (virDomainRebootEnsureACL(dom->conn, vm->def, flags) < 0) > + goto cleanup; > + > + if (virCHDomainObjBeginJob(vm, CH_JOB_MODIFY) < 0) > + goto cleanup; > + > + if (virDomainObjCheckActive(vm) < 0) > + goto endjob; > + > + state = virDomainObjGetState(vm, NULL); > + if (state != VIR_DOMAIN_RUNNING && state != VIR_DOMAIN_PAUSED) { > + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", > + _("only can reboot running/paused domain")); > + goto endjob; > + } else { > + if (virCHMonitorRebootVM(priv->monitor) < 0) { > + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("failed to reboot domain")); > + goto endjob; > + } > + } > + > + if (state == VIR_DOMAIN_RUNNING) > + virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, VIR_DOMAIN_RUNNING_BOOTED); > + else > + virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, VIR_DOMAIN_RUNNING_UNPAUSED); > + > + ret = 0; > + > + endjob: > + virCHDomainObjEndJob(vm); > + > + cleanup: > + virDomainObjEndAPI(&vm); > + return ret; > +} > + > +static int > +chDomainSuspend(virDomainPtr dom) > +{ > + virCHDomainObjPrivatePtr priv; > + virDomainObjPtr vm; > + int ret = -1; > + > + if (!(vm = chDomObjFromDomain(dom))) > + goto cleanup; > + > + priv = vm->privateData; > + > + if (virDomainSuspendEnsureACL(dom->conn, vm->def) < 0) > + goto cleanup; > + > + if (virCHDomainObjBeginJob(vm, CH_JOB_MODIFY) < 0) > + goto cleanup; > + > + if (virDomainObjCheckActive(vm) < 0) > + goto endjob; > + > + if (virDomainObjGetState(vm, NULL) != VIR_DOMAIN_RUNNING) { > + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", > + _("only can suspend running domain")); > + goto endjob; > + } else { > + if (virCHMonitorSuspendVM(priv->monitor) < 0) { Just to explicitly confirm, "suspend" in libvirt terminology means pause execution of the guest CPUs, nothing else. Does that match semantics from CH ? > + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("failed to suspend domain")); > + goto endjob; > + } > + } > + > + virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, VIR_DOMAIN_PAUSED_USER); > + > + ret = 0; > + > + endjob: > + virCHDomainObjEndJob(vm); > + > + cleanup: > + virDomainObjEndAPI(&vm); > + return ret; > +} > +static int chStateInitialize(bool privileged, > + const char *root, > + virStateInhibitCallback callback G_GNUC_UNUSED, > + void *opaque G_GNUC_UNUSED) > +{ > + if (root != NULL) { > + virReportError(VIR_ERR_INVALID_ARG, "%s", > + _("Driver does not support embedded mode")); > + return -1; > + } > + > + /* Check that the user is root, silently disable if not */ > + if (!privileged) { > + VIR_INFO("Not running privileged, disabling driver"); > + return VIR_DRV_STATE_INIT_SKIPPED; > + } Same question as earlier about why its forbidding non-root usage ? > +static virConnectDriver chConnectDriver = { > + .localOnly = true, > + .uriSchemes = (const char *[]){"CH", "Ch", "ch", "Cloud-Hypervisor", NULL}, Your driver URI is "ch:///system", so the only thing that should be in this list is "ch". > + .hypervisorDriver = &chHypervisorDriver, > +}; > + > +static virStateDriver chStateDriver = { > + .name = "CH", I'd suggest "cloud-hypervisor" to be a bit clearer > + .stateInitialize = chStateInitialize, > + .stateCleanup = chStateCleanup, > +}; > + > +int chRegister(void) > +{ > + if (virRegisterConnectDriver(&chConnectDriver, false) < 0) > + return -1; > + if (virRegisterStateDriver(&chStateDriver) < 0) > + return -1; > + return 0; > +} > diff --git a/src/ch/ch_monitor.c b/src/ch/ch_monitor.c > new file mode 100644 > index 0000000000..ccef70f719 > --- /dev/null > +++ b/src/ch/ch_monitor.c > +static int > +virCHMonitorBuildCPUJson(virJSONValuePtr content, virDomainDefPtr vmdef) > +{ > + virJSONValuePtr cpus; > + unsigned int maxvcpus = 0; > + unsigned int nvcpus = 0; > + virDomainVcpuDefPtr vcpu; > + size_t i; > + > + /* count maximum allowed number vcpus and enabled vcpus when boot.*/ > + maxvcpus = virDomainDefGetVcpusMax(vmdef); > + for (i = 0; i < maxvcpus; i++) { > + vcpu = virDomainDefGetVcpu(vmdef, i); > + if (vcpu->online) > + nvcpus++; > + } > + > + if (maxvcpus != 0 || nvcpus != 0) { > + cpus = virJSONValueNewObject(); > + if (virJSONValueObjectAppendNumberInt(cpus, "boot_vcpus", nvcpus) < 0) > + goto cleanup; > + if (virJSONValueObjectAppendNumberInt(cpus, "max_vcpus", vmdef->maxvcpus) < 0) > + goto cleanup; > + if (virJSONValueObjectAppend(content, "cpus", cpus) < 0) > + goto cleanup; > + } > + > + return 0; > + > + cleanup: > + virJSONValueFree(cpus); > + return -1; > +} > + > +static int > +virCHMonitorBuildKernelJson(virJSONValuePtr content, virDomainDefPtr vmdef) > +{ > + virJSONValuePtr kernel; > + > + if (vmdef->os.kernel == NULL) { > + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("Kernel image path in this domain is not defined")); > + return -1; > + } else { > + kernel = virJSONValueNewObject(); > + if (virJSONValueObjectAppendString(kernel, "path", vmdef->os.kernel) < 0) > + goto cleanup; > + if (virJSONValueObjectAppend(content, "kernel", kernel) < 0) > + goto cleanup; > + } > + > + return 0; > + > + cleanup: > + virJSONValueFree(kernel); > + return -1; > +} > + > +static int > +virCHMonitorBuildCmdlineJson(virJSONValuePtr content, virDomainDefPtr vmdef) > +{ > + virJSONValuePtr cmdline; > + > + cmdline = virJSONValueNewObject(); > + if (vmdef->os.cmdline) { > + if (virJSONValueObjectAppendString(cmdline, "args", vmdef->os.cmdline) < 0) > + goto cleanup; > + if (virJSONValueObjectAppend(content, "cmdline", cmdline) < 0) > + goto cleanup; > + } > + > + return 0; > + > + cleanup: > + virJSONValueFree(cmdline); > + return -1; > +} I'm kinda inclined to say it is overkill to have separate methods for kernel, initrd and cmdline. I'd handle all three at same time since they go hand in hand with each other. > +static int > +virCHMonitorBuildDiskJson(virJSONValuePtr disks, virDomainDiskDefPtr diskdef) > +{ > + virJSONValuePtr disk; > + > + if (diskdef->src != NULL && diskdef->src->path != NULL) { You must validate diskdef->type before accesing any fields for the source otherwise it is highly liable to crash on unexpected user input. Generally we'd expect a switch(diskdef->type) and cases to each applicable type you want to support, with errors raised for others. > + disk = virJSONValueNewObject(); > + if (virJSONValueObjectAppendString(disk, "path", diskdef->src->path) < 0) > + goto cleanup; > + if (diskdef->src->readonly) { > + if (virJSONValueObjectAppendBoolean(disk, "readonly", true) < 0) > + goto cleanup; > + } > + if (virJSONValueArrayAppend(disks, disk) < 0) > + goto cleanup; > + } > + > + return 0; > + > + cleanup: > + virJSONValueFree(disk); > + return -1; > +} > +static int > +virCHMonitorBuildNetJson(virJSONValuePtr nets, virDomainNetDefPtr netdef) > +{ > + virDomainNetType netType = virDomainNetGetActualType(netdef); > + char macaddr[VIR_MAC_STRING_BUFLEN]; > + virJSONValuePtr net; > + > + // check net type at first > + net = virJSONValueNewObject(); > + > + switch (netType) { > + case VIR_DOMAIN_NET_TYPE_ETHERNET: > + if (netdef->guestIP.nips == 1) { > + const virNetDevIPAddr *ip = netdef->guestIP.ips[0]; > + g_autofree char *addr = NULL; > + virSocketAddr netmask; > + g_autofree char *netmaskStr = NULL; > + if (!(addr = virSocketAddrFormat(&ip->address))) > + goto cleanup; > + if (virJSONValueObjectAppendString(net, "ip", addr) < 0) > + goto cleanup; > + > + if (virSocketAddrPrefixToNetmask(ip->prefix, &netmask, AF_INET) < 0) { > + virReportError(VIR_ERR_INTERNAL_ERROR, > + _("Failed to translate net prefix %d to netmask"), > + ip->prefix); > + goto cleanup; > + } > + if (!(netmaskStr = virSocketAddrFormat(&netmask))) > + goto cleanup; > + if (virJSONValueObjectAppendString(net, "mask", netmaskStr) < 0) > + goto cleanup; > + } else { ....report VIR_ERR_CONFIG_UNSUPPORTED... } As a general rule, when building the config from the user's XML, report as many errors as possible. It is preferrable for apps to see an error report than to have their config silently ignored. > + break; > + case VIR_DOMAIN_NET_TYPE_VHOSTUSER: > + if ((virDomainChrType)netdef->data.vhostuser->type != VIR_DOMAIN_CHR_TYPE_UNIX) { > + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", > + _("vhost_user type support UNIX socket in this CH")); > + goto cleanup; > + } else { > + if (virJSONValueObjectAppendString(net, "vhost_socket", netdef->data.vhostuser->data.nix.path) < 0) > + goto cleanup; > + if (virJSONValueObjectAppendBoolean(net, "vhost_user", true) < 0) > + goto cleanup; > + } > + break; > + case VIR_DOMAIN_NET_TYPE_BRIDGE: > + case VIR_DOMAIN_NET_TYPE_NETWORK: > + case VIR_DOMAIN_NET_TYPE_DIRECT: > + case VIR_DOMAIN_NET_TYPE_USER: > + case VIR_DOMAIN_NET_TYPE_SERVER: > + case VIR_DOMAIN_NET_TYPE_CLIENT: > + case VIR_DOMAIN_NET_TYPE_MCAST: > + case VIR_DOMAIN_NET_TYPE_INTERNAL: > + case VIR_DOMAIN_NET_TYPE_HOSTDEV: > + case VIR_DOMAIN_NET_TYPE_UDP: > + case VIR_DOMAIN_NET_TYPE_LAST: > + default: > + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", > + _("Only ethernet and vhost_user type network types are " > + "supported in this CH")); > + goto cleanup; > + } For _LAST/defalt, use virReportEnumRangeError > +static int > +virCHMonitorBuildVMJson(virDomainDefPtr vmdef, char **jsonstr) > +{ > + virJSONValuePtr content = virJSONValueNewObject(); > + int ret = -1; > + > + if (vmdef == NULL) { > + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("VM is not defined")); > + goto cleanup; > + } > + > + if (virCHMonitorBuildCPUJson(content, vmdef) < 0) > + goto cleanup; > + > + if (virCHMonitorBuildMemoryJson(content, vmdef) < 0) > + goto cleanup; > + > + if (virCHMonitorBuildKernelJson(content, vmdef) < 0) > + goto cleanup; > + > + if (virCHMonitorBuildCmdlineJson(content, vmdef) < 0) > + goto cleanup; > + > + if (virCHMonitorBuildInitramfsJson(content, vmdef) < 0) > + goto cleanup; > + > + if (virCHMonitorBuildDisksJson(content, vmdef) < 0) > + goto cleanup; > + > + if (virCHMonitorBuildNetsJson(content, vmdef) < 0) > + goto cleanup; > + > + if (!(*jsonstr = virJSONValueToString(content, false))) > + goto cleanup; > + > + ret = 0; > + > + cleanup: > + virJSONValueFree(content); > + return ret; > +} Obviously you're doing the bare minimum here to get something running which is fine for an initial merge. Ultimately though, it should be reporting erors for any types of devices you don't wish to support. > +static virCommandPtr > +chMonitorBuildSocketCmd(virDomainObjPtr vm, const char *socket_path) > +{ > + virCommandPtr cmd; > + > + if (vm->def == NULL) { > + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("VM is not defined")); > + return NULL; > + } > + > + if (vm->def->emulator != NULL) > + cmd = virCommandNew(vm->def->emulator); > + else > + cmd = virCommandNew(CH_CMD); If you populate the guest capabilities with the default emulator path, it should get populated into the virDomainDef, and thus be able to assume it is non-NULL. > + > + virCommandAddArgList(cmd, "--api-socket", socket_path, NULL); > + > + return cmd; > +} > + > +virCHMonitorPtr > +virCHMonitorNew(virDomainObjPtr vm, const char *socketdir) > +{ > + virCHMonitorPtr ret = NULL; > + virCHMonitorPtr mon = NULL; > + virCommandPtr cmd = NULL; > + int pings = 0; > + > + if (virCHMonitorInitialize() < 0) > + return NULL; > + > + if (!(mon = virObjectLockableNew(virCHMonitorClass))) > + return NULL; > + > + mon->socketpath = g_strdup_printf("%s/%s-socket", socketdir, vm->def->name); > + > + /* prepare to launch Cloud-Hypervisor socket */ > + if (!(cmd = chMonitorBuildSocketCmd(vm, mon->socketpath))) > + goto cleanup; > + > + if (virFileMakePath(socketdir) < 0) { > + virReportSystemError(errno, > + _("Cannot create socket directory '%s'"), > + socketdir); > + goto cleanup; > + } > + > + /* launch Cloud-Hypervisor socket */ > + if (virCommandRunAsync(cmd, &mon->pid) < 0) > + goto cleanup; > + > + /* get a curl handle */ > + mon->handle = curl_easy_init(); > + > + /* try to ping VMM socket 5 times to make sure it is ready */ > + while (pings < 5) { > + if (virCHMonitorPingVMM(mon) == 0) > + break; > + if (pings == 5) > + goto cleanup; > + > + g_usleep(100 * 1000); > + } This is highly undesirable. Is there no way to launch the CH process such that the socket is guaranteed to be accepting requests by the time it has forked into the background ? Or is there a way to pass in a pre-opened FD for the monitor socket ? This kind of wait with timeout will cause startup failures due to timeout under load. > + > + /* now has its own reference */ > + virObjectRef(mon); > + mon->vm = virObjectRef(vm); > + > + ret = mon; > + > + cleanup: > + virCommandFree(cmd); > + return ret; > +} > +struct data { > + char trace_ascii; /* 1 or 0 */ > +}; > + > +static void dump(const char *text, > + FILE *stream, > + unsigned char *ptr, > + size_t size, > + char nohex) > +{ > + size_t i; > + size_t c; > + > + unsigned int width = 0x10; > + > + if (nohex) > + /* without the hex output, we can fit more on screen */ > + width = 0x40; > + > + fprintf(stream, "%s, %10.10lu bytes (0x%8.8lx)\n", text, (unsigned long)size, > + (unsigned long)size); > + > + for (i = 0; i < size; i += width) { > + > + fprintf(stream, "%4.4lx: ", (unsigned long)i); > + > + if (!nohex) { > + /* hex not disabled, show it */ > + for (c = 0; c < width; c++) { > + if (i + c < size) > + fprintf(stream, "%02x ", ptr[i + c]); > + else > + fputs(" ", stream); > + } > + } > + > + for (c = 0; (c < width) && (i + c < size); c++) { > + /* check for 0D0A; if found, skip past and start a new line of output */ > + if (nohex && (i + c + 1 < size) && ptr[i + c] == 0x0D && > + ptr[i + c + 1] == 0x0A) { > + i += (c + 2 - width); > + break; > + } > + fprintf(stream, "%c", > + (ptr[i + c] >= 0x20) && (ptr[i + c] < 0x80) ? ptr[i + c] : '.'); > + /* check again for 0D0A, to avoid an extra \n if it's at width */ > + if (nohex && (i + c + 2 < size) && ptr[i + c + 1] == 0x0D && > + ptr[i + c + 2] == 0x0A) { > + i += (c + 3 - width); > + break; > + } > + } > + fputc('\n', stream); /* newline */ > + } > + fflush(stream); > +} > + > +static int my_trace(CURL *handle, > + curl_infotype type, > + char *data, > + size_t size, > + void *userp) > +{ > + struct data *config = (struct data *)userp; > + const char *text = ""; > + (void)handle; /* prevent compiler warning */ > + > + switch (type) { > + case CURLINFO_TEXT: > + fprintf(stderr, "== Info: %s", data); > + /* FALLTHROUGH */ > + case CURLINFO_END: /* in case a new one is introduced to shock us */ > + break; > + case CURLINFO_HEADER_OUT: > + text = "=> Send header"; > + break; > + case CURLINFO_DATA_OUT: > + text = "=> Send data"; > + break; > + case CURLINFO_SSL_DATA_OUT: > + text = "=> Send SSL data"; > + break; > + case CURLINFO_HEADER_IN: > + text = "<= Recv header"; > + break; > + case CURLINFO_DATA_IN: > + text = "<= Recv data"; > + break; > + case CURLINFO_SSL_DATA_IN: > + text = "<= Recv SSL data"; > + break; > + } > + > + dump(text, stderr, (unsigned char *)data, size, config->trace_ascii); > + return 0; > +} I presume this is left over from when you were debugging this. Printing to stderr is not really something we would want in the code. > +static int > +virCHMonitorCurlPerform(CURL *handle) > +{ > + CURLcode errorCode; > + long responseCode = 0; > + > + struct data config; > + > + config.trace_ascii = 1; /* enable ascii tracing */ > + > + curl_easy_setopt(handle, CURLOPT_DEBUGFUNCTION, my_trace); > + curl_easy_setopt(handle, CURLOPT_DEBUGDATA, &config); > + > + /* the DEBUGFUNCTION has no effect until we enable VERBOSE */ > + curl_easy_setopt(handle, CURLOPT_VERBOSE, 1L); > + > + errorCode = curl_easy_perform(handle); > + > + if (errorCode != CURLE_OK) { > + virReportError(VIR_ERR_INTERNAL_ERROR, > + _("curl_easy_perform() returned an error: %s (%d)"), > + curl_easy_strerror(errorCode), errorCode); > + return -1; > + } > + > + errorCode = curl_easy_getinfo(handle, CURLINFO_RESPONSE_CODE, > + &responseCode); > + > + if (errorCode != CURLE_OK) { > + virReportError(VIR_ERR_INTERNAL_ERROR, > + _("curl_easy_getinfo(CURLINFO_RESPONSE_CODE) returned an " > + "error: %s (%d)"), curl_easy_strerror(errorCode), > + errorCode); > + return -1; > + } > + > + if (responseCode < 0) { > + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("curl_easy_getinfo(CURLINFO_RESPONSE_CODE) returned a " > + "negative response code")); > + return -1; > + } > + > + return responseCode; > +} > + > +int > +virCHMonitorPutNoContent(virCHMonitorPtr mon, const char *endpoint) > +{ > + char *url; Change to... g_autofree char *url = NULL; and... > + int responseCode = 0; > + int ret = -1; > + > + url = g_strdup_printf("%s/%s", URL_ROOT, endpoint); > + > + virObjectLock(mon); > + > + /* reset all options of a libcurl session handle at first */ > + curl_easy_reset(mon->handle); > + > + curl_easy_setopt(mon->handle, CURLOPT_UNIX_SOCKET_PATH, mon->socketpath); > + curl_easy_setopt(mon->handle, CURLOPT_URL, url); > + curl_easy_setopt(mon->handle, CURLOPT_PUT, true); > + curl_easy_setopt(mon->handle, CURLOPT_HTTPHEADER, NULL); > + > + responseCode = virCHMonitorCurlPerform(mon->handle); > + > + virObjectUnlock(mon); > + > + if (responseCode == 200 || responseCode == 204) > + ret = 0; > + > + VIR_FREE(url); ...you can drop this. Likewise for any where else you malloc a variable and free it in the same scope. > + return ret; > +} > + > +int > +virCHMonitorGet(virCHMonitorPtr mon, const char *endpoint) > +{ > + char *url; > + int responseCode = 0; > + int ret = -1; > + > + url = g_strdup_printf("%s/%s", URL_ROOT, endpoint); > + > + virObjectLock(mon); > + > + /* reset all options of a libcurl session handle at first */ > + curl_easy_reset(mon->handle); > + > + curl_easy_setopt(mon->handle, CURLOPT_UNIX_SOCKET_PATH, mon->socketpath); > + curl_easy_setopt(mon->handle, CURLOPT_URL, url); > + > + responseCode = virCHMonitorCurlPerform(mon->handle); > + > + virObjectUnlock(mon); > + > + if (responseCode == 200 || responseCode == 204) > + ret = 0; > + > + VIR_FREE(url); > + return ret; > +} > +int > +virCHMonitorCreateVM(virCHMonitorPtr mon) > +{ > + g_autofree char *url = NULL; > + int responseCode = 0; > + int ret = -1; > + g_autofree char *payload = NULL; > + struct curl_slist *headers = NULL; > + > + url = g_strdup_printf("%s/%s", URL_ROOT, URL_VM_CREATE); > + headers = curl_slist_append(headers, "Accept: application/json"); > + headers = curl_slist_append(headers, "Content-Type: application/json"); > + headers = curl_slist_append(headers, "Expect:"); Is that empty "Expect:" header intentional ? > + > + if (virCHMonitorBuildVMJson(mon->vm->def, &payload) != 0) > + return -1; > + > + virObjectLock(mon); > + > + /* reset all options of a libcurl session handle at first */ > + curl_easy_reset(mon->handle); > + > + curl_easy_setopt(mon->handle, CURLOPT_UNIX_SOCKET_PATH, mon->socketpath); > + curl_easy_setopt(mon->handle, CURLOPT_URL, url); > + curl_easy_setopt(mon->handle, CURLOPT_CUSTOMREQUEST, "PUT"); > + curl_easy_setopt(mon->handle, CURLOPT_HTTPHEADER, headers); > + curl_easy_setopt(mon->handle, CURLOPT_POSTFIELDS, payload); > + > + responseCode = virCHMonitorCurlPerform(mon->handle); > + > + virObjectUnlock(mon); > + > + if (responseCode == 200 || responseCode == 204) > + ret = 0; > + > + curl_slist_free_all(headers); > + VIR_FREE(url); > + VIR_FREE(payload); > + return ret; > +} > +struct _virCHMonitor { > + virObjectLockable parent; > + > + CURL *handle; Does this curl handle keep a persistent connection open to the CH process ? And is that able to be used to detect immedaitely when the CH process shuts down or crashes unexpectedly ? > + > + char *socketpath; > + > + pid_t pid; > + > + virDomainObjPtr vm; > +}; > + > +int virCHMonitorResumeVM(virCHMonitorPtr mon); > diff --git a/src/ch/ch_process.c b/src/ch/ch_process.c > new file mode 100644 > index 0000000000..15f4801549 > --- /dev/null > +++ b/src/ch/ch_process.c > @@ -0,0 +1,125 @@ > +/* > + * Copyright Intel Corp. 2020 > + * > + * ch_driver.h: header file for Cloud-Hypervisor driver functions > + * > + * This library is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * This library is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with this library. If not, see > + * <http://www.gnu.org/licenses/>. > + */ > + > diff --git a/src/ch/meson.build b/src/ch/meson.build > new file mode 100644 > index 0000000000..e41691bc05 > --- /dev/null > +++ b/src/ch/meson.build > @@ -0,0 +1,44 @@ > +ch_driver_sources = [ > + 'ch_conf.c', > + 'ch_conf.h', > + 'ch_domain.c', > + 'ch_domain.h', > + 'ch_driver.c', > + 'ch_driver.h', > + 'ch_monitor.c', > + 'ch_monitor.h', > + 'ch_process.c', > + 'ch_process.h', > +] > + > +driver_source_files += files(ch_driver_sources) > + > +stateful_driver_source_files += files(ch_driver_sources) > + > +if conf.has('WITH_CH') > + ch_driver_impl = static_library( > + 'virt_driver_ch_impl', > + [ > + ch_driver_sources, > + ], > + dependencies: [ > + access_dep, > + curl_dep, > + log_dep, > + src_dep, > + ], > + include_directories: [ > + conf_inc_dir, > + ], > + ) > + > + virt_modules += { > + 'name': 'virt_driver_ch', > + 'link_whole': [ > + ch_driver_impl, > + ], > + 'link_args': [ > + libvirt_no_undefined, > + ], > + } > +endif We're moving to a world where each stateful virt driver has its own daemon. The rules here are enough for libvirtd, but you'll need to add rules to create a "virtchd" daemon. The qemu/meson.build will show how - look for "virt_daemons". > diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c > index 5d3ae8bb28..11b183ad2c 100644 > --- a/src/conf/domain_conf.c > +++ b/src/conf/domain_conf.c > @@ -130,6 +130,7 @@ VIR_ENUM_IMPL(virDomainVirt, > "parallels", > "bhyve", > "vz", > + "ch", > ); > > VIR_ENUM_IMPL(virDomainOS, > diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h > index 8a0f26f5c0..4dba588728 100644 > --- a/src/conf/domain_conf.h > +++ b/src/conf/domain_conf.h > @@ -139,6 +139,7 @@ typedef enum { > VIR_DOMAIN_VIRT_PARALLELS, > VIR_DOMAIN_VIRT_BHYVE, > VIR_DOMAIN_VIRT_VZ, > + VIR_DOMAIN_VIRT_CH, As mentioned earlier, not required since you're still using KVM IIUC. Overall, this looks like a reasonable start of the driver. What is the security model for the processes ? The libvirt driver here is running as root and spawning CH as root. We don't really want the VM process running as root though. It really needs to be unprivileged from a DAC POV. Obviously that's quite a bit more work for you todo, but it should be opssible to share alot of the security mgmt infrastructure that we already have for QEMU and LXC drivers. It can manage DAC permissions and apply SELinux or AppArmor MAC labelling/profiles. The other big question is around device addressing. If using PCI, then PCI address assignment logic is critical to ensure consistent guest ABI. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|