On 28.11.2019 14:11, Peter Krempa wrote: > On Thu, Nov 28, 2019 at 10:51:56 +0000, Nikolay Shirokovskiy wrote: >> >> >> On 28.11.2019 12:05, Peter Krempa wrote: >>> On Thu, Nov 28, 2019 at 07:29:08 +0000, Nikolay Shirokovskiy wrote: >>>> >>>> >>>> On 27.11.2019 17:56, Peter Krempa wrote: >>>>> On Wed, Nov 27, 2019 at 17:19:18 +0300, Nikolay Shirokovskiy wrote: >>>>>> Due to race qemuDomainGetBlockJobInfo can return there is >>>>>> no block job for disk but later call to spawn new blockjob >>>>>> can fail because libvirt internally still not process blockjob >>>>>> finishing. Thus let's wait for blockjob finishing if we >>>>>> report there is no more blockjob. >>>>> >>>>> Could you please elaborate how this happened? e.g. provide some logs? >>>>> >>>>> Polling with qemuDomainGetBlockJobInfo will always be racy and if the >>>>> problem is that diskPriv->blockjob is allocated but qemu didn't report >>>>> anything I suspect the problem is somewhere else. >>>>> >>>>> >>>>>> >>>>>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> >>>>>> --- >>>>>> src/qemu/qemu_driver.c | 18 +++++++++++++++++- >>>>>> 1 file changed, 17 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c >>>>>> index 669c12d6ca..b148df3a57 100644 >>>>>> --- a/src/qemu/qemu_driver.c >>>>>> +++ b/src/qemu/qemu_driver.c >>>>>> @@ -17785,12 +17785,26 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, >>>>>> goto endjob; >>>>>> } >>>>>> >>>>>> + qemuBlockJobSyncBegin(job); >>>>>> + >>>>>> qemuDomainObjEnterMonitor(driver, vm); >>>>>> ret = qemuMonitorGetBlockJobInfo(qemuDomainGetMonitor(vm), job->name, &rawInfo); >>>>>> if (qemuDomainObjExitMonitor(driver, vm) < 0) >>>>>> ret = -1; >>>>>> - if (ret <= 0) >>>>>> + if (ret < 0) >>>>>> + goto endjob; >>>>>> + >>>>>> + if (ret == 0) { >>>>> >>>>> So if qemu returns that there is no blockjob ... >>>>> >>>>>> + qemuBlockJobUpdate(vm, job, QEMU_ASYNC_JOB_NONE); >>>>>> + while (qemuBlockJobIsRunning(job)) { >>>>> >>>>> but we think it's still running it's either that the event arrived but >>>>> wasn't processed yet. In that case this would help, but the outcome >>>>> would not differ much from the scenario if the code isn't here as the >>>>> event would be processed later. >>>>> >>>>> >>>>>> + if (virDomainObjWait(vm) < 0) { >>>>>> + ret = -1; >>>>>> + goto endjob; >>>>>> + } >>>>>> + qemuBlockJobUpdate(vm, job, QEMU_ASYNC_JOB_NONE); >>>>> >>>>> If there's a bug and qemu doesn't know that there's a job but libvirt >>>>> thinks there's one but qemu is right, this will lock up here as the >>>>> event will never arrive. >>>>> >>>>>> + } >>>>>> goto endjob; >>>>> >>>>> My suggested solution would be to fake the job from the data in 'job' >>>>> rather than attempt to wait in this case e.g. >>>>> >>>>> if (ret == 0) { >>>>> rawStats.type = job->type; >>>>> rawStats.ready = 0; >>>>> } >>>>> >>>>> and then make it to qemuBlockJobInfoTranslate which sets some fake >>>>> progress data so that the job looks active. >>>>> >>>>> That way you get a response that there's a job without the potential to >>>>> deadlock. >>>>> >>>> >>>> Fake progress data does not look nice. May be we should cache progress on qemuDomainGetBlockJobInfo >>>> calls so it will not go backwards after the patch? >>> >>> I agree but it's always a tradeoff between complexity of the code and >>> benefit of it. If it's a very unlikely scenario, returning fake data is >>> simpler than adding caching and persistence over restarts. >>> >>>> I can understand protecting against possible bugs when recovering is not possible or difficult >>>> like case of data corruption. But in this case it is matter of libvirtd restart I guess. >>> >>> Well, that's why I'm asking what the original bug is. Maybe the problem >>> lies somewhere else and there's a more elegant fix. >>> >> >> I have mgmt polling for blockpull job completion. After completion fail/success >> is infered by inspecting domain xml. So even on success due to race we can >> see the backing chaing that should be deleted by blockpull and decide it was >> failure. > > The strongly preferred method to figure out success of a job is to use > the VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2 event which is delivered after the > block job is finalized by libvirt. > > This means that any backing chain modification/redetection is finished > after it's delivered and additionally the event reports success state. > > Additionally since libvirt 5.10 and qemu 4.2 blockdev is used which > impacts block jobs. With blockdev used we persist the jobs in qemu until > we process the state of them. This means that the above race is > impossible with current qemu as the block job would still be present in > the output of query-blockjobs until we process the event from qemu and > thus modify the backing chain in the XML during one lock. > > In the pre-blockdev era I can see that there's possibility that it could > happen, but in my opinion it's extremely unlikely. The race window is > between qemu removing the job and delivering the event. After the event > is delivered we enqueue it in the job handling and thus it will be > processed in order. This means that the code would have to > call virDomainGetBlockJobInfo, where the info from qemu is no longer > present, but that also means that qemu sent the event notifying about > the finished job. The window where the virDomainGetXMLDesc call would > have to take place is between those two actions. > > Given that with newest qemu and libvirt the situation will not happen > and it's unlikely I'm against adding this complex code you proposed. > We can go with the fake progress data if you insist on fixing the > polling scenario which is suboptimal by itself. > Thanx for elaborate answer. In the long term the problem is solved by block jobs reaped explicitly so I'm ok with dismissing the patch. Nikolay -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list