On 29.10.2020 10:59, Peter Krempa wrote: > On Thu, Oct 22, 2020 at 17:46:28 +0300, Nikolay Shirokovskiy wrote: >> >> >> On 22.10.2020 15:12, Peter Krempa wrote: >>> On Tue, Oct 20, 2020 at 16:44:07 +0300, Nikolay Shirokovskiy wrote: >>>> Currently in qemuProcessHandleBlockJob we either offload blockjob event >>>> processing to the worker thread or notify another thread that waits for >>>> blockjob event and that thread processes the event. But sometimes after event >>>> is offloaded to the worker thread we need to process the event in a different >>>> thread. >>>> >>>> To be able to to do it let's always set newstate/errmsg and then let whatever >>>> thread is first process it. >>>> >>>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> >>>> --- >>>> src/qemu/qemu_driver.c | 17 ++++------------- >>>> src/qemu/qemu_process.c | 20 ++++++++++++-------- >>>> 2 files changed, 16 insertions(+), 21 deletions(-) >>>> >>> >>> [...] >>> >>>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c >>>> index 6422881..4d63e7d 100644 >>>> --- a/src/qemu/qemu_process.c >>>> +++ b/src/qemu/qemu_process.c >>>> @@ -953,13 +953,19 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon G_GNUC_UNUSED, >>>> if (!(disk = qemuProcessFindDomainDiskByAliasOrQOM(vm, diskAlias, NULL))) >>>> goto cleanup; >>>> >>>> - job = qemuBlockJobDiskGetJob(disk); >>>> + if (!(job = qemuBlockJobDiskGetJob(disk))) { >>>> + VIR_DEBUG("creating new block job object for '%s'", diskAlias); >>>> + if (!(job = qemuBlockJobDiskNew(vm, disk, type, diskAlias))) >>> >>> So this actually writes out the status XML. I'm not sure if that is a >>> good idea to do if we don't hold the domain job. The premise is that >>> threads holding the job lock might have partially modified it. >>> >> >> Hi, Peter. >> >> Yeah, I did not notice that. >> >> There is another reason to have this patch (modified of course to prevent the issue >> you mentioned). Without it is just hard to synchronize block jobs correctly on >> reconnect. >> >> We call "query-block-jobs" to do the sync and set block jobs state based on >> query result. But first we can have pending events that we received before >> quering. So after the reconnect we have to deal with this outdated events. >> Second we can receive events after querying but before reconnection is finished >> and these events also will be offloaded. This can be an issue if we use sync >> mode like as in qemuMigrationSrcCancel because we won't see this events as they >> offloaded. I don't think qemuMigrationSrcCancel is really affected as all we >> want to do it just cancel block jobs. > > I'm still worried about moving the code which adds the blockjob to the > blockjob list outside of a MODIFY domain job. I considered the blockjob > list to be modified only with the MODIFY domain job and thus this would > at the very least require audit of all the code paths related to > blockjobs. > >> So we can miss some block job events in sync mode on reconnection and can have >> outdated block job events after reconnection. Probably it can be handled >> another way but with the approach as in this patch we can eliminate these >> possibilities. > > The legacy block job code should be robust against those though. Any > state changes are done by redetection rather than internal state > handling so firing a event twice should be fine. Same way the code must > be robust against missing an event especially on reconnection. With the > old code we don't store the job list so we don't know if we missed > something in the first place. > >> Also "new" block job code uses approach just as this patch namely save >> state changes in job so other threads can see it not just the worker thread. >> And this option is used by reconnection code for new block jobs. > > Yes, but my worry is modifying the old block job code at all. Ideally > we'll just delete it ASAP. My testing is nowadays limited to new qemu > with new block job code and hopefully everybody upgrades ASAP. Yeah, I understand. We in Virtuozzo still based on Centos 7 yet (but moving to Centos 8 where last updates has qemu/libvirt that support new blockjobs). I have to add some Vz specific changes to libvirt (to support our Virtuozzo Storage) so I closely inspect old block job code and thus send these patches. > > Thus I'm very worried by changes to the old block job code and > especially those which would impact semantics of the job handling in > regards of the new block job code (allowing additions to the block job > table outside of the MODIFY domain job). > > If you can pull off the same without adding a block job without holding > the modify domain job I might be okay with such patch. But preferrably > just upgrade to use the new code which makes sense to fix. > Well I need to change patches at least in our version of libvirt to address the issues/worries you mentioned. I want to send next version too so that you can take a look and give some advice/share your opinion. It is helpful that for this version you noticed things I didn't. Nikolay