Hello, everyone. This is successor to [1] "[PATCH v2 RFC 0/4] qemu: replace nested job with interruptible async job state". I tried to expand on series rationale, to make better splitting and to add comments. ABSTRACT Let's use interruptible async job state instead of nested jobs to control concurrent execution of async and regular jobs. RATIONALE I'm not quite happy with current implementation of concurrent execution async and regular jobs because it is not straigthforward. What is current implementation of jobs concurrency? While async job is running any compatible regular job is allowed to run concurrently. Practically it means that as soon as async job drops the domain lock another rpc thread can run in, find domain in the domain list and start a regular job. This is what happens when for example migration drops the lock before waiting for migration completion event or before sending request to destination side. But if we are going to send command to qemu monitor and accordingly drop domain lock this concurrency is not desired because qemu monitor can not handle simultaneus commands now. (This is probably not the only reason. I guess we don't want to think about what can happen if regular job will run concurrenly for example to migration at any place. Ability to run concurrently during disk mirror or state transfer which take the most time of migration looks enough). Thus nested jobs were introduced. Now if async job wants to send command to monitor it start special nested jobs beforehand thus effectively block concurrent regular jobs. To me it sounds like hacky way to implement the fact that async job allows concurrency only in limited places. Let's call them interruptible. So I suggest introduce interruptible state for async job. Before we are going to wait for migration completion for example we enter this state allowing regular jobs run concurrently. Later when we meet migration completion condition we wait for concurrent regular job to finish if any before proceed. This final step is required in this approach by definition - when async job is not interruptible no regular jobs can run concurrenly. On practice this protects us from sending qemu monitor command from async job while concurrent regular job is not finished its qemu monitor communication yet. Sounds like this final wait can have consequenses on migration. Like what if finishing concurrent job take too much time. In this case we get increased migration downtime. Well we will get this increase with current appoach too. Because right after migration completion condition is met we fetch migration stats and wait in enter monitor function anyway until concurrent job is finished. This also demontrates that current appoach isolates async and concurrent regular job loosely. It is possible that regular job starts then async job resumes and continues its execution until it requires qemu monitor and only then regular job finishes its execution. I guess it will be easy to reason about concurrency if when concurrent regular is started async job can not continue until regular job is finished. Check patch 4 for a demonstration that job isolation is good. TESTS To test concurrency I ran about 100 times a migration of non-empty domain with shared disk in a run with concurrent domstats for this domain in a tight loop. Looks good. MISSING PARTS - Enter/exit interruptible state at the end/begin of migration phases. Until this point is done we have a bit of degradation because domain won't response for example to queries between migration phases. - Update THREADS.txt I'm going to implement missing parts as soon as RFC has overall approvement. TODO 1 Drop driver argument from calls to qemuDomainObjEnterMonitor 2 Replace qemuDomainObjEnterMonitorAsync with qemuDomainObjEnterMonitor 3 Make qemuDomainObjExitMonitor void and drop its driver argument. [1] https://www.redhat.com/archives/libvir-list/2017-May/msg00018.html Nikolay Shirokovskiy (11): qemu: introduce routines for interruptible state qemu: check interruptible state instead of taking nested job qemu: enter interruptlible state where appropriate qemu: remove nested job usage from qemuProcessStop qemu: remove unused qemuDomainObjBeginNestedJob qemu: remove QEMU_JOB_ASYNC_NESTED qemu: remove liveness check from qemuDomainObjExitMonitor qemu: drop qemuDomainObjEnterMonitorInternal qemu: drop qemuDomainObjExitMonitorInternal conf: cleanup: drop virDomainObjWait qemu: rename qemuDomainNestedJobAllowed src/conf/domain_conf.c | 19 ---- src/conf/domain_conf.h | 1 - src/libvirt_private.syms | 1 - src/qemu/qemu_domain.c | 265 ++++++++++++++++++++++++++++++---------------- src/qemu/qemu_domain.h | 17 ++- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_migration.c | 18 ++-- src/qemu/qemu_process.c | 19 +--- 8 files changed, 195 insertions(+), 147 deletions(-) -- 1.8.3.1 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list