On 22.10.2020 15:12, Peter Krempa wrote: > On Tue, Oct 20, 2020 at 16:44:07 +0300, Nikolay Shirokovskiy wrote: >> Currently in qemuProcessHandleBlockJob we either offload blockjob event >> processing to the worker thread or notify another thread that waits for >> blockjob event and that thread processes the event. But sometimes after event >> is offloaded to the worker thread we need to process the event in a different >> thread. >> >> To be able to to do it let's always set newstate/errmsg and then let whatever >> thread is first process it. >> >> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> >> --- >> src/qemu/qemu_driver.c | 17 ++++------------- >> src/qemu/qemu_process.c | 20 ++++++++++++-------- >> 2 files changed, 16 insertions(+), 21 deletions(-) >> > > [...] > >> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c >> index 6422881..4d63e7d 100644 >> --- a/src/qemu/qemu_process.c >> +++ b/src/qemu/qemu_process.c >> @@ -953,13 +953,19 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon G_GNUC_UNUSED, >> if (!(disk = qemuProcessFindDomainDiskByAliasOrQOM(vm, diskAlias, NULL))) >> goto cleanup; >> >> - job = qemuBlockJobDiskGetJob(disk); >> + if (!(job = qemuBlockJobDiskGetJob(disk))) { >> + VIR_DEBUG("creating new block job object for '%s'", diskAlias); >> + if (!(job = qemuBlockJobDiskNew(vm, disk, type, diskAlias))) > > So this actually writes out the status XML. I'm not sure if that is a > good idea to do if we don't hold the domain job. The premise is that > threads holding the job lock might have partially modified it. > Hi, Peter. Yeah, I did not notice that. There is another reason to have this patch (modified of course to prevent the issue you mentioned). Without it is just hard to synchronize block jobs correctly on reconnect. We call "query-block-jobs" to do the sync and set block jobs state based on query result. But first we can have pending events that we received before quering. So after the reconnect we have to deal with this outdated events. Second we can receive events after querying but before reconnection is finished and these events also will be offloaded. This can be an issue if we use sync mode like as in qemuMigrationSrcCancel because we won't see this events as they offloaded. I don't think qemuMigrationSrcCancel is really affected as all we want to do it just cancel block jobs. So we can miss some block job events in sync mode on reconnection and can have outdated block job events after reconnection. Probably it can be handled another way but with the approach as in this patch we can eliminate these possibilities. Also "new" block job code uses approach just as this patch namely save state changes in job so other threads can see it not just the worker thread. And this option is used by reconnection code for new block jobs. Nikolay