Hello Daniel and all @libvirt, reviving this old thread for multifd save/restore (for high performance migration of a large VM to an nvme disk and viceversa), now that we have made substantial progress on point b) below :-) There is a lot of ongoing work in QEMU spearheaded by Fabiano to migrate in an optimized way now to/from disk by leveraging among others multifd and file offsets. A couple questions that come to mind for the libvirt side though. In terms of our use case, we would need to trigger these migrations from virsh save, restore, managedsave / start. 1) Can you confirm this is still a good target? It would seem right from my perspective to hook up save/restore first, and then reuse the same mechanism for managedsave / start. 2) Do we expect to pass filename or file descriptor from libvirt into QEMU? As is, libvirt today generally passes an already opened file descriptor to QEMU for migrations, roughly: {"execute": "getfd", "arguments": {"fdname":"migrate"}} (passing the already open fd from libvirt f.e. 10) {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"}}' Do we want to change libvirt to migrate to a file: URI ? Does this have consequence for "labeling" / security sandboxing? Or would it be better to continue opening the fd in libvirt, writing the libvirt header, and then passing the existing open fd to QEMU, using QMP command "getfd", followed by "migrate"? In this second case we would need to inform QEMU of the offset into the already open fd. Internally, the QEMU multifd code just reads and writes using pread, pwrite, so there is in any case just one fd to worry about, but who should own it, libvirt or QEMU? 3) How do we deal with O_DIRECT? In the prototype we were setting the O_DIRECT on the fd from libvirt in response to the user request for --bypass-cache, which is needed 99% of the time with large VMs. I think I remember that we plan to write from libvirt normally (without O_DIRECT) and then set the flag later, but should libvirt or QEMU set the O_DIRECT flag? This likely depends on who owns the fd? Thank you for your thoughts from the libvirt side and your help in resurrecting this topic, Claudio On 6/7/22 11:19, Claudio Fontana wrote: > This is v11 of the multifd save prototype, which focuses on saving > to a single file instead of requiring multiple separate files for > multifd channels. > > This series demonstrates a way to save and restore from a single file > by using interleaved channels of a size equal to the transfer buffer > size (1MB), and relying on UNIX holes to avoid wasting physical disk > space due to channels of different size. > > KNOWN ISSUES: > > a) still applies only to save/restore (no managed save etc) > > b) this is not not done in QEMU, where it could be possible to teach > QEMU to migrate directly to a file or block device in a > block-aligned way by altering the migration stream code and the > state migration code of all devices. > > > changes from v10: > > * virfile: add new API virFileDiskCopyChannel, which extends the > existing virFileDiskCopy to work with parallel channels in the file. > > * drop use of virthread API, use GLIB for threads. > > * pass only a single FD to the multifd-helper, which will then open > additional FDs as required for the multithreaded I/O. > > * simplify virQEMUSaveFd API, separating the initialization from the > addition of extra channels. > > * adapt all documentation to mention a single file instead of multiple. > > * remove the "Lim" versions of virFileDirectRead and Write, they are > not needed. > > --- > > changes from v9: > > * exposed virFileDirectAlign > > * separated the >= 2 QEMU_SAVE_VERSION change in own patch > > * reworked the write code to add the alignment padding to the > data_len, making the on disk format compatible when loaded > from an older libvirt. > > * reworked the read code to use direct I/O APIs only for actual > direct I/O file descriptors, so as to make old images work > with newer libvirt. > > --- > > changes from v8: > > * rebased on master > > * reordered patches to add more upstreamable content at the start > > * split introduction of virQEMUSaveFd, so the first part is multifd-free > > * new virQEMUSaveDataRead as a mirror of virQEMUSaveDataWrite > > * introduced virFileDirect API, using it in virFileDisk operations and > for virQEMUSaveRead and virQEMUSaveWrite > > --- > > changes from v7: > > * [ base params API and iohelper refactoring upstreamed ] > > * extended the QEMU save image format more, to record the nr > of multifd channels on save. Made the data header struct packed. > > * removed --parallel-connections from the restore command, as now > it is useless due to QEMU save image format extension. > > * separate out patches to expose migration_params APIs to saveimage, > including qemuMigrationParamsSetString, SetCap, SetInt. > > * fixed bugs in the ImageOpen patch (missing saveFd init), removed > some whitespace, and fixed some convoluted code paths for return > value -3. > > --- > > changes from v6: > > * improved error path handling, with error messages and especially > cancellation of qemu process on error during restore. > > * split patches more and reordered them to keep general refactoring > at the beginning before the --parallel stuff is introduced. > > * improved multifd compression support, including adding an enum > and extending the QEMU save image format to record the compression > used on save, and pick it up automatically on restore. > > --- > > changes from v4: > > * runIO renamed to virFileDiskCopy and rethought arguments > > * renamed new APIs from ...ParametersFlags to ...Params > > * introduce the new virDomainSaveParams and virDomainRestoreParams > without any additional parameters, so they can be upstreamed first. > > * solved the issue in the gendispatch.pl script generating code that > was missing the conn parameter. > > --- > > changes from v3: > > * reordered series to have all helper-related change at the start > > * solved all reported issues from ninja test, including documentation > > * fixed most broken migration capabilities code (still imperfect likely) > > * added G_GNUC_UNUSED as needed > > * after multifd restore, added what I think were the missing operations: > > qemuProcessRefreshState(), > qemuProcessStartCPUs() - most importantly, > virDomainObjSave() > > The domain now starts running after restore without further encouragement > > * removed the sleep(10) from the multifd-helper > > --- > > changes from v2: > > * added ability to restore the VM from disk using multifd > > * fixed the multifd-helper to work in both directions, > assuming the need to listen for save, and connect for restore. > > * fixed a large number of bugs, and probably introduced some :-) > > --- > > Claudio Fontana (33): > virfile: introduce virFileDirect APIs > virfile: use virFileDirect API in runIOCopy > qemu: saveimage: rework image read/write to be O_DIRECT friendly > qemu: saveimage: assume future formats will also support compression > virfile: virFileDiskCopy: prepare for O_DIRECT files without wrapper > qemu: saveimage: introduce virQEMUSaveFd > qemu: saveimage: convert qemuSaveImageCreate to use virQEMUSaveFd > qemu: saveimage: convert qemuSaveImageOpen to use virQEMUSaveFd > tools: prepare doSave to use parameters > tools: prepare cmdRestore to use parameters > libvirt: add new VIR_DOMAIN_SAVE_PARALLEL flag and parameter > qemu: add stub support for VIR_DOMAIN_SAVE_PARALLEL in save > qemu: add stub support for VIR_DOMAIN_SAVE_PARALLEL in restore > virfile: add new API virFileDiskCopyChannel > multifd-helper: new helper for parallel save/restore > qemu: saveimage: update virQEMUSaveFd struct for parallel save > qemu: saveimage: wire up saveimage code with the multifd helper > qemu: capabilities: add multifd to the probed migration capabilities > qemu: saveimage: add multifd related fields to save format > qemu: migration_params: add APIs to set Int and Cap > qemu: migration: implement qemuMigrationSrcToFilesMultiFd for save > qemu: add parameter to qemuMigrationDstRun to skip waiting > qemu: implement qemuSaveImageLoadMultiFd for restore > tools: add parallel parameter to virsh save command > tools: add parallel parameter to virsh restore command > qemu: add migration parameter multifd-compression > libvirt: add new VIR_DOMAIN_SAVE_PARAM_PARALLEL_COMPRESSION > qemu: saveimage: add parallel compression argument to ImageCreate > qemu: saveimage: add stub support for multifd compression parameter > qemu: migration: expose qemuMigrationParamsSetString > qemu: saveimage: implement multifd-compression in parallel save > qemu: saveimage: restore compressed parallel images > tools: add parallel-compression parameter to virsh save command > > docs/manpages/virsh.rst | 26 +- > include/libvirt/libvirt-domain.h | 24 + > po/POTFILES | 1 + > src/libvirt_private.syms | 6 + > src/qemu/qemu_capabilities.c | 4 + > src/qemu/qemu_capabilities.h | 2 + > src/qemu/qemu_driver.c | 146 ++-- > src/qemu/qemu_migration.c | 160 ++-- > src/qemu/qemu_migration.h | 16 +- > src/qemu/qemu_migration_params.c | 71 +- > src/qemu/qemu_migration_params.h | 15 + > src/qemu/qemu_process.c | 3 +- > src/qemu/qemu_process.h | 5 +- > src/qemu/qemu_saveimage.c | 703 +++++++++++++----- > src/qemu/qemu_saveimage.h | 69 +- > src/qemu/qemu_snapshot.c | 6 +- > src/util/iohelper.c | 3 + > src/util/meson.build | 16 + > src/util/multifd-helper.c | 359 +++++++++ > src/util/virfile.c | 391 +++++++--- > src/util/virfile.h | 11 + > .../caps_4.0.0.aarch64.xml | 1 + > .../qemucapabilitiesdata/caps_4.0.0.ppc64.xml | 1 + > .../caps_4.0.0.riscv32.xml | 1 + > .../caps_4.0.0.riscv64.xml | 1 + > .../qemucapabilitiesdata/caps_4.0.0.s390x.xml | 1 + > .../caps_4.0.0.x86_64.xml | 1 + > .../caps_4.1.0.x86_64.xml | 1 + > .../caps_4.2.0.aarch64.xml | 1 + > .../qemucapabilitiesdata/caps_4.2.0.ppc64.xml | 1 + > .../qemucapabilitiesdata/caps_4.2.0.s390x.xml | 1 + > .../caps_4.2.0.x86_64.xml | 1 + > .../caps_5.0.0.aarch64.xml | 2 + > .../qemucapabilitiesdata/caps_5.0.0.ppc64.xml | 2 + > .../caps_5.0.0.riscv64.xml | 2 + > .../caps_5.0.0.x86_64.xml | 2 + > .../qemucapabilitiesdata/caps_5.1.0.sparc.xml | 2 + > .../caps_5.1.0.x86_64.xml | 2 + > .../caps_5.2.0.aarch64.xml | 2 + > .../qemucapabilitiesdata/caps_5.2.0.ppc64.xml | 2 + > .../caps_5.2.0.riscv64.xml | 2 + > .../qemucapabilitiesdata/caps_5.2.0.s390x.xml | 2 + > .../caps_5.2.0.x86_64.xml | 2 + > .../caps_6.0.0.aarch64.xml | 2 + > .../qemucapabilitiesdata/caps_6.0.0.s390x.xml | 2 + > .../caps_6.0.0.x86_64.xml | 2 + > .../caps_6.1.0.x86_64.xml | 2 + > .../caps_6.2.0.aarch64.xml | 2 + > .../qemucapabilitiesdata/caps_6.2.0.ppc64.xml | 2 + > .../caps_6.2.0.x86_64.xml | 2 + > .../caps_7.0.0.aarch64.xml | 2 + > .../qemucapabilitiesdata/caps_7.0.0.ppc64.xml | 2 + > .../caps_7.0.0.x86_64.xml | 2 + > .../caps_7.1.0.x86_64.xml | 2 + > tools/virsh-domain.c | 101 ++- > 55 files changed, 1748 insertions(+), 445 deletions(-) > create mode 100644 src/util/multifd-helper.c >