On Fri, Oct 04, 2019 at 08:14:39AM +0200, Peter Krempa wrote: > On Thu, Oct 03, 2019 at 16:52:12 -0500, Jonathon Jongsma wrote: > > Some layered products such as oVirt have requested a way to avoid being > > blocked by guest agent commands when querying a loaded vm. For example, > > many guest agent commands are polled periodically to monitor changes, > > and rather than blocking the calling process, they'd prefer to simply > > time out when an agent query is taking too long. > > > > This patch adds a way for the user to specify a custom agent timeout > > that is applied to all agent commands. > > > > One special case to note here is the 'guest-sync' command. 'guest-sync' > > is issued internally prior to calling any other command. (For example, > > when libvirt wants to call 'guest-get-fsinfo', we first call > > 'guest-sync' and then call 'guest-get-fsinfo'). > > > > Previously, the 'guest-sync' command used a 5-second timeout > > (VIR_DOMAIN_QEMU_AGENT_COMMAND_DEFAULT), whereas the actual command that > > followed always blocked indefinitely > > (VIR_DOMAIN_QEMU_AGENT_COMMAND_BLOCK). As part of this patch, if a > > custom timeout is specified that is shorter than > > 5 seconds, this new timeout also used for 'guest-sync'. If there is no > > custom timeout or if the custom timeout is longer than 5 seconds, we > > will continue to use the 5-second timeout. > > > > See https://bugzilla.redhat.com/show_bug.cgi?id=1705426 for additional details. > > > > Signed-off-by: Jonathon Jongsma <jjongsma@xxxxxxxxxx> > > @@ -2737,3 +2730,19 @@ qemuAgentGetTimezone(qemuAgentPtr mon, > > > > return 0; > > } > > + > > +int > > +qemuAgentSetTimeout(qemuAgentPtr mon, > > + int timeout) > > +{ > > + if (timeout < VIR_DOMAIN_QEMU_AGENT_COMMAND_MIN) { > > + virReportError(VIR_ERR_INVALID_ARG, > > + _("guest agent timeout '%d' is " > > + "less than the minimum '%d'"), > > + timeout, VIR_DOMAIN_QEMU_AGENT_COMMAND_MIN); > > This error is misleading as -1 and -2 are special values and not actual > timeout. > > > + return -1; > > + } > > + > > + mon->timeout = timeout; > > + return 0; > > +} > > [...] > > > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > > index 1e041a8bac..09251cc9e2 100644 > > --- a/src/qemu/qemu_driver.c > > +++ b/src/qemu/qemu_driver.c > > @@ -23434,6 +23434,29 @@ qemuDomainGetGuestInfo(virDomainPtr dom, > > return ret; > > } > > > > +static int > > +qemuDomainQemuAgentSetTimeout(virDomainPtr dom, > > + int timeout) > > +{ > > + virDomainObjPtr vm = NULL; > > + qemuAgentPtr agent; > > + int ret = -1; > > + > > + if (!(vm = qemuDomObjFromDomain(dom))) > > + goto cleanup; > > + > > + if (virDomainQemuAgentSetTimeoutEnsureACL(dom->conn, vm->def) < 0) > > + goto cleanup; > > + > > + agent = qemuDomainObjEnterAgent(vm); > > You must acquire a job on @vm if you want to call this: > > /* > * obj must be locked before calling > * > * To be called immediately before any QEMU agent API call. > * Must have already called qemuDomainObjBeginAgentJob() or > * qemuDomainObjBeginJobWithAgent() and checked that the VM is > * still active. > * > * To be followed with qemuDomainObjExitAgent() once complete > */ > qemuAgentPtr > qemuDomainObjEnterAgent(virDomainObjPtr obj) > > > > + ret = qemuAgentSetTimeout(agent, timeout); > > + qemuDomainObjExitAgent(vm, agent); > > Also this API is inherently racy if you have two clients setting the > timeout and it will influence calls of a different client. > > IMO the only reasonable approach is to add new APIs which have a > 'timeout' parameter for any agent API which requires tunable timeout to > prevent any races and unexpected behaviour. > > Other possibility may be to add a qemu config file option for this but > that is not really dynamic. I guess the key question is whether we actually have a compelling use case to vary the timeout on a per-command basis. If not, then we could do fine with a global config that is either recorded in the domain XML, or in the global QEMU config. The possible causes of slowness are - Host is overloaded so the guest is not being scheduled - Guest is overloaded so the agent is not being scheduled - The agent has crash/deadlocked - The agent is maliciously not responding - The command genuinely takes a long time to perform its action The first 4 of those are fine with a global timeout either on guest or in the driver. Only the last one really pushes for having this per-public API. Looking commands, ones I think can take a long time are - FS trim - this can run for many minutes if there's alot to trim - FS freeze - this might need to wait for apps to quiesce their I/O IIUC - Guest exec - it can run any command The latter isn't used in libvirt, can be be run via the QGA passthrough api in libvirt-qemu.so So sadly I think we genuinely do have a need for per-commad timeouts, for at least some of the API calls. I don't think all of them need it though. I though we could likely add a qemu.conf setting for the globak timeout, but then add a timeout to individual APIs in specific cases where we know they can genuinely take a very long time to execute. We must also ensure we NEVER block any regular libvirt APIs when a guest agent comamnd is running. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list