ping On 18.10.2018 11:55, Nikolay Shirokovskiy wrote: > > > On 17.10.2018 23:04, John Ferlan wrote: >> >> >> On 10/16/18 3:22 AM, Nikolay Shirokovskiy wrote: >>> >>> >>> On 16.10.2018 03:00, John Ferlan wrote: >>>> >>>> >>>> On 10/8/18 4:10 AM, Nikolay Shirokovskiy wrote: >>>>> Block job abort operation can not handle properly qemu crashes when waiting for >>>>> abort/pivot completion. Deadlock scenario is next: >>>>> >>>>> - qemuDomainBlockJobAbort waits for pivot/abort completion >>>>> - qemu crashes, then qemuProcessBeginStopJob broadcasts for VM condition and >>>>> then waits for job condition (taken by qemuDomainBlockJobAbort) >>>>> - qemuDomainBlockJobAbort awakes but nothing really changed, VM is still >>>>> active (vm->def->id != -1) so thread starts waiting for completion again. >>>>> Now two threads are in deadlock. >>>>> >>>>> First let's remove broadcast in qemuProcessBeginStopJob. It is simply wrong >>>>> because it is not set any condition before broadcast so that awaked threads can >>>>> not detect any changes. Crashing domain during async job will continue to be >>>>> handled properly because destroy job can run concurrently with async job and >>>>> destroy job calls qemuProcessStop which sets vm->def->id to -1 and broadcasts. >>>> >>>> Hmm... Although blockjobs are not my area of expertise, I do seem to >>>> have a knack for reading and commenting on patches with these edge >>>> conditions. >>>> >>>> At first, taken alone this made it seem like separate patches are >>>> required, but maybe not depending on the relationship described above. >>>> As an aside, for this paragraph hunk you could call out commit 4d0c535a3 >>>> where this is/was introduced. Beyond the refactor, the broadcast was >>>> added; however, it seems it was done so on purpose since the broadcast >>>> would seemingly allowing something to be awoken. >>>> >>>> Beyond that - take away the scenario you describing where QEMU crashes. >>>> In the normal path, if you remove the broadcast, then do things work >>>> properly? >>> >>> As far as I can see. In all jobs where we we wait on vm condition we >>> check misc state variables after that so if state is not changed than >>> broadcasting will not help. (The only exception is migration and derivatives >>> with qemu not supporting events but in this case we use sleeps and >>> do not wait). >>> >> >> To be clear, you are referencing virDomainObjWait[Until] callers that >> aren't found within qemu_migration.c. > > Not exactly. I mean look at any code that waits with virDomainObjWait[Until]. > That removed broadcast won't help them to finish waiting because the only > action they take after awake is checking state variable. They don't for > example send monitor commands in which case they would get "monitor closed" error > and finishi waiting. > >> >> The two qemu_hotplug.c examples are waiting for QEMU events related to >> tray eject or device removal, but are limited in their duration. If they >> don't get the event in the specified time they have their means to >> signify the timeout. >> >> The two qemu_driver.c examples are possibly waiting forever. One waits >> for an external event to signify a memory dump is complete via >> qemuProcessHandleDumpCompleted or the job aborted. The other waits for >> the blockjob to be completed when qemuBlockJobEventProcess clear the >> flag. Both should also theoretically fail when the domain or qemu dies; >> however, since both use virDomainObjWait which when properly tickled by >> broadcast will call virDomainObjIsActive to compare vm->def->id != -1. >> >> So the contention (for both I think) is that because the = -1 is not >> done when QEMU is killed we're stuck. So rather than wait on something >> that won't happen - use the EOF event as a way to force exit once a >> broadcast happens via a qemu_domain specific qemuDomainObjWait. >> >> Is that a "fair summary"? >> > > Yes. But this approach is taken only for non async jobs. Async jobs are > ok because they can run concurrently with destroy job. > >>>> >>>> Since a block job would set @priv->blockjob when a block job starts and >>>> is cleared during qemuBlockJobEventProcess processing when status is >>>> VIR_DOMAIN_BLOCK_JOB_COMPLETED, VIR_DOMAIN_BLOCK_JOB_FAILED, or >>>> VIR_DOMAIN_BLOCK_JOB_CANCELED. >>>> >>>> What about setting a @priv->blockjobAbort when the abort starts. Then >>>> perhaps processMonitorEOFEvent or qemuProcessHandleMonitorEOF can handle >>>> that properly so that we don't deadlock. >>> >>> But how we can handle it the other way? I see no other option now besides >>> setting some state variable and signalling after that to help non async >>> job to finish and then let EOF handler proceed. (However check suggestions after --- ) >>> >> >> OK, so perhaps that's just a rename of your @monEOF, but specific to the >> example, but based on what I typed above would seemingly not be enough, >> so let's stick with monEOF... >> >>> Nikolay >>> >>>> >>>> Perhaps or hopefully, Jirka or Peter will comment too with this bump. >>>> >>>> John >>>> >>>>> >>>>> Second let's introduce flag that EOF is received and broadcast after that. >>>>> Now non async jobs can check this flag in wait loop. >>>>> >>>>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> >>>>> >>>>> --- >>>>> >>>>> Diff from v1: >> >> >> Just making sure - this RFC v2 comes from a series in Apr/May of this year: >> >> https://www.redhat.com/archives/libvir-list/2018-April/msg01752.html >> >> w/ review dialog spilling into May: >> https://www.redhat.com/archives/libvir-list/2018-May/msg00126.html >> >> based on the series : >> >> https://www.redhat.com/archives/libvir-list/2018-April/msg01713.html >> >> that you SNACK'd. >> >> An awful long time to remember context! When you reference earlier >> patches, please try to remember to place a link in the cover - it makes >> it easier to find. Even if it's only a couple of days. Weeks and months >> from now someone may reference the series and want to peruse the history >> of the previous review comments. > > I will. > >> >> >>>>> >>>>> - patches 1 and 2 are already merged >>>>> - don't bother with reporting monitor EOF reason to user as most of >>>>> time it is simply "unexpected eof" (this implies dropping patch 3) >>>>> - drop patch 5 as we now always report "domain is being stopped" >>>>> in qemuDomainObjWait >>>>> - don't signal on monitor error for simplicity (otherwise we need to report >>>>> something more elaborate that "domain is being stopped" as we don't >>>>> kill domain on monitor errors. On the other hand I guess monitor >>>>> error is rare case to handle it right now) >>>>> - keep virDomainObjWait for async jobs >>>>> >>>>> It's a bit uneven that for async jobs domain is destroyed concurrently and for >>>>> non async jobs it will be actually destroyed after job get completed. Also if >>>>> non async job needs issuing commands to qemu on cleanup then we will send these >>>>> commands in vain polluting logs etc because qemu process in not running at this >>>>> moment but typical check (virDomainObjIsActive) will think it is still running. >>>>> >>>>> Domain is destroyed (qemuProcessStop) in a job due to patches [1] and [2]. >>>>> However AFAIU it is not neccessary. If qemuProcessStop does not drop VM lock >>>>> then we don't need extra job to make qemuProcessStop and main job not >>>>> interleave. And we can drop the lock now only in qemuDomainObjBeginNestedJob in >>>>> qemuProcessStop which is introduced in [2]. AFAIU we can fix issues mentioned in >>>>> [2] the other way for example like it is done for qemu agent - we save agent >>>>> monitor reference on stack for entering/exiting agent monitor. >>>>> >>>>> So I wonder can we instead of this fix remove job for qemuProcessStop and run >>>>> destroying domain cuncurrently for non async jobs too. >>>>> >>>>> [1] >>>>> commit 8c9ff9960b29d4703a99efdd1cadcf6f48799cc0 >>>>> Author: Jiri Denemark <jdenemar@xxxxxxxxxx> >>>>> Date: Thu Feb 11 15:32:48 2016 +0100 >>>>> >>>>> qemu: Process monitor EOF in a job >>>>> >>>>> [2] >>>>> commit 81f50cb92d16643bcd749e3ab5b404b8b7cec643 >>>>> Author: Jiri Denemark <jdenemar@xxxxxxxxxx> >>>>> Date: Thu Feb 11 11:20:28 2016 +0100 >>>>> >>>>> qemu: Avoid calling qemuProcessStop without a job >>>>> >>>>> src/qemu/qemu_domain.c | 39 +++++++++++++++++++++++++++++++++++++++ >>>>> src/qemu/qemu_domain.h | 4 ++++ >>>>> src/qemu/qemu_driver.c | 2 +- >>>>> src/qemu/qemu_hotplug.c | 4 ++-- >>>>> src/qemu/qemu_process.c | 9 +++++---- >>>>> 5 files changed, 51 insertions(+), 7 deletions(-) >>>>> >>>>> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c >>>>> index 939b2a3..aead72b 100644 >>>>> --- a/src/qemu/qemu_domain.c >>>>> +++ b/src/qemu/qemu_domain.c >>>>> @@ -13534,3 +13534,42 @@ qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason) >>>>> >>>>> return VIR_DOMAIN_EVENT_RESUMED_UNPAUSED; >>>>> } >>>>> + >>>>> + >>>>> +/** >>>>> + * Waits for domain condition to be triggered for a specific period of time. >>>>> + * if @until is 0 then waits indefinetely. >> >> *indefinitely >> >>>>> + * >>>>> + * Returns: >>>>> + * -1 on error >>>>> + * 0 on success >>>>> + * 1 on timeout >>>>> + */ >>>>> +int >>>>> +qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until) >> >> Each argument on it's own line. >> >>>>> +{ >>>>> + qemuDomainObjPrivatePtr priv = vm->privateData; >>>>> + int rc; >>>>> + >>>>> + if (until) >>>>> + rc = virCondWaitUntil(&vm->cond, &vm->parent.lock, until); >>>>> + else >>>>> + rc = virCondWait(&vm->cond, &vm->parent.lock); >>>>> + >>>>> + if (rc < 0) { >>>>> + if (until && errno == ETIMEDOUT) >>>>> + return 1; >>>>> + >>>>> + virReportSystemError(errno, "%s", >>>>> + _("failed to wait for domain condition")); >>>>> + return -1; >>>>> + } >>>>> + >>>>> + if (priv->monEOF) { >>>>> + virReportError(VIR_ERR_OPERATION_FAILED, "%s", >>>>> + _("domain is being stopped")); >> >> "monitor has been closed" >> > > Well I thought user is unaware of concept of monitor. We do have commands > like qemu-monitor-command but they are targeted for devs/devops I guess. > Nevertheless original message a bit confusing too (being stopped???) :) > >>>>> + return -1; >>>>> + } >> >> No chance that vm->def->id could be set to -1 during any of this, right? >> Perhaps it doesn't hurt to check virDomainObjIsActive too. Paranoia, >> you know... >> > > I'm afraid this can become dead code that we will resist to touch > as we don't understand why it exists) I see no possibilities why > we miss monEOF but hit virDomainObjIsActive now. Anyway before > this patch we would get deadlock anyway so it won't get any worse > if we don't add the check you suggest. If we really find situations > when it will be useful then we will do it immediately. > >>>>> + >>>>> + return 0; >>>>> +} >>>>> diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h >>>>> index 2f8a1bf..36ab294 100644 >>>>> --- a/src/qemu/qemu_domain.h >>>>> +++ b/src/qemu/qemu_domain.h >>>>> @@ -281,6 +281,7 @@ struct _qemuDomainObjPrivate { >>>>> virDomainChrSourceDefPtr monConfig; >>>>> bool monJSON; >>>>> bool monError; >>>>> + bool monEOF; >>>>> unsigned long long monStart; >>>>> >>>>> qemuAgentPtr agent; >>>>> @@ -1085,4 +1086,7 @@ void qemuDomainStorageIdReset(qemuDomainObjPrivatePtr priv); >>>>> virDomainEventResumedDetailType >>>>> qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason); >>>>> >>>>> +int >>>>> +qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until); >>>>> + >>>>> #endif /* __QEMU_DOMAIN_H__ */ >> >> All of the above can be it's own patch to "qemu: Introduce >> qemuDomainObjWait" with a commit message to describe why this would be >> used instead of the virDomainObjWait especially considering the QEMU >> events and the inability to get the exit criteria set by >> virDomainObjIsActive of vm->def->id != -1. >>>>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c >>>>> index b238309..f4250da 100644 >>>>> --- a/src/qemu/qemu_driver.c >>>>> +++ b/src/qemu/qemu_driver.c >>>>> @@ -17142,7 +17142,7 @@ qemuDomainBlockJobAbort(virDomainPtr dom, >>>>> qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); >>>>> qemuBlockJobUpdate(vm, QEMU_ASYNC_JOB_NONE, disk, NULL); >>>>> while (diskPriv->blockjob) { >>>>> - if (virDomainObjWait(vm) < 0) { >>>>> + if (qemuDomainObjWait(vm, 0) < 0) { >>>>> ret = -1; >>>>> goto endjob; >>>>> } >>>>> diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c >>>>> index 4558a3c..8189629 100644 >>>>> --- a/src/qemu/qemu_hotplug.c >>>>> +++ b/src/qemu/qemu_hotplug.c >>>>> @@ -165,7 +165,7 @@ qemuHotplugWaitForTrayEject(virDomainObjPtr vm, >>>>> return -1; >>>>> >>>>> while (disk->tray_status != VIR_DOMAIN_DISK_TRAY_OPEN) { >>>>> - if ((rc = virDomainObjWaitUntil(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0) >>>>> + if ((rc = qemuDomainObjWait(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0) >>>>> return -1; >>>>> >>>>> if (rc > 0) { >>>>> @@ -5002,7 +5002,7 @@ qemuDomainWaitForDeviceRemoval(virDomainObjPtr vm) >>>>> until += qemuDomainRemoveDeviceWaitTime; >>>>> >>>>> while (priv->unplug.alias) { >>>>> - if ((rc = virDomainObjWaitUntil(vm, until)) == 1) >>>>> + if ((rc = qemuDomainObjWait(vm, until)) == 1) >>>>> return 0; >>>>> >>>>> if (rc < 0) { >>>>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c >>>>> index 29b0ba1..dd03269 100644 >>>>> --- a/src/qemu/qemu_process.c >>>>> +++ b/src/qemu/qemu_process.c >>>>> @@ -290,9 +290,12 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, >>>>> >>>>> virObjectLock(vm); >>>>> >>>>> + priv = vm->privateData; >> >> We could just initialize this at the top since we don't even check !vm >> first anyway. >> >>>>> + priv->monEOF = true; >>>>> + virDomainObjBroadcast(vm); >> >> So why Broadcast here? Why not just let processMonitorEOFEvent and the >> call to qemuProcessBeginStopJob handle this? > > Because qemuProcessBeginStopJob can not begin job, it is occupied by waiter. > >> >> Also, let's place this after the following DEBUG message - nice to keep >> those up as high as possible. >> >>>>> + >>>>> VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name); >>>>> >>>>> - priv = vm->privateData; >>>>> if (priv->beingDestroyed) { >>>>> VIR_DEBUG("Domain is being destroyed, EOF is expected"); >>>>> goto cleanup; >>>>> @@ -5996,6 +5999,7 @@ qemuProcessPrepareDomain(virQEMUDriverPtr driver, >>>>> >>>>> priv->monJSON = true; >>>>> priv->monError = false; >>>>> + priv->monEOF = false; >>>>> priv->monStart = 0; >>>>> priv->gotShutdown = false; >>>>> priv->runningReason = VIR_DOMAIN_RUNNING_UNKNOWN; >>>>> @@ -6965,9 +6969,6 @@ qemuProcessBeginStopJob(virQEMUDriverPtr driver, >>>>> if (qemuProcessKill(vm, killFlags) < 0) >>>>> goto cleanup; >>>>> >>>>> - /* Wake up anything waiting on domain condition */ >>>>> - virDomainObjBroadcast(vm); >>>>> - >> >> Since this is called by more than just processMonitorEOFEvent, I would >> think removing it could cause issues for qemuProcessAutoDestroy and >> qemuDomainDestroyFlags. What would cause them to have their possibly >> blocked blockjob or external memory dump to be notified of this event? >> > > Removing won't hurt. Autodestroy and destroy both first kill qemu and > then try to acquire job. If async job is active then they just aquire > destroy job immediately because async job let destroy run cuncurrently. > If non async job is active then we get EOF first which after this patch let > not async job finish and then destroy/autodestroy can proceed with stopping > the domain. The removed broadcast just makes spurious wakeup of waiter > I think. > > I would also like to discuss the alternative approach described just before > patch diff stats. It looks more appopriate to me but we need a comment > from Jiri I guess to follow it. > > Nikolay > >> >>>>> if (qemuDomainObjBeginJob(driver, vm, job) < 0) >>>>> goto cleanup; >>>>> >>>>> > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list