Daniel P. Berrangé <berrange@xxxxxxxxxx> writes: > On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel wrote: >> Introduce support for QEMU's new mapped-ram stream format [1]. >> mapped-ram is enabled by default if the underlying QEMU advertises >> the mapped-ram migration capability. It can be disabled by changing >> the 'save_image_version' setting in qemu.conf to version '2'. >> >> To use mapped-ram with QEMU: >> - The 'mapped-ram' migration capability must be set to true >> - The 'multifd' migration capability must be set to true and >> the 'multifd-channels' migration parameter must set to 1 >> - QEMU must be provided an fdset containing the migration fd >> - The 'migrate' qmp command is invoked with a URI referencing the >> fdset and an offset where to start writing the data stream, e.g. >> >> {"execute":"migrate", >> "arguments":{"detach":true,"resume":false, >> "uri":"file:/dev/fdset/0,offset=0x11921"}} >> >> The mapped-ram stream, in conjunction with direct IO and multifd >> support provided by subsequent patches, can significantly improve >> the time required to save VM memory state. The following tables >> compare mapped-ram with the existing, sequential save stream. In >> all cases, the save and restore operations are to/from a block >> device comprised of two NVMe disks in RAID0 configuration with >> xfs (~8600MiB/s). The values in the 'save time' and 'restore time' >> columns were scraped from the 'real' time reported by time(1). The >> 'Size' and 'Blocks' columns were provided by the corresponding >> outputs of stat(1). >> >> VM: 32G RAM, 1 vcpu, idle (shortly after boot) >> >> | save | restore | >> | time | time | Size | Blocks >> -----------------------+---------+---------+--------------+-------- >> legacy | 6.193s | 4.399s | 985744812 | 1925288 >> -----------------------+---------+---------+--------------+-------- >> mapped-ram | 5.109s | 1.176s | 34368554354 | 1774472 > > I'm surprised by the restore time speed up, as I didn't think > mapped-ram should make any perf difference without direct IO > and multifd. > >> -----------------------+---------+---------+--------------+-------- >> legacy + direct IO | 5.725s | 4.512s | 985765251 | 1925328 >> -----------------------+---------+---------+--------------+-------- >> mapped-ram + direct IO | 4.627s | 1.490s | 34368554354 | 1774304 > > Still somewhat surprised by the speed up on restore here too Hmm, I'm thinking this might be caused by zero page handling. The non mapped-ram path has an extra buffer_is_zero() and memset() of the hva page. Now, is it an issue that mapped-ram skips that memset? I assume guest memory will always be clear at the start of migration. There won't be a situation where the destination VM starts with memory already dirty... *and* the save file is also different, otherwise it wouldn't make any difference. > >> -----------------------+---------+---------+--------------+-------- >> mapped-ram + direct IO | | | | >> + multifd-channels=8 | 4.421s | 0.845s | 34368554318 | 1774312 >> ------------------------------------------------------------------- >> >> VM: 32G RAM, 30G dirty, 1 vcpu in tight loop dirtying memory >> >> | save | restore | >> | time | time | Size | Blocks >> -----------------------+---------+---------+--------------+--------- >> legacy | 25.800s | 14.332s | 33154309983 | 64754512 >> -----------------------+---------+---------+--------------+--------- >> mapped-ram | 18.742s | 15.027s | 34368559228 | 64617160 >> -----------------------+---------+---------+--------------+--------- >> legacy + direct IO | 13.115s | 18.050s | 33154310496 | 64754520 >> -----------------------+---------+---------+--------------+--------- >> mapped-ram + direct IO | 13.623s | 15.959s | 34368557392 | 64662040 > > These figures make more sense with restore time matching save time > more or less. > >> -----------------------+-------- +---------+--------------+--------- >> mapped-ram + direct IO | | | | >> + multifd-channels=8 | 6.994s | 6.470s | 34368554980 | 64665776 >> -------------------------------------------------------------------- >> >> As can be seen from the tables, one caveat of mapped-ram is the logical >> file size of a saved image is basically equivalent to the VM memory size. >> Note however that mapped-ram typically uses fewer blocks on disk. >> >> Another caveat of mapped-ram is the requirement for a seekable file >> descriptor, which currently makes it incompatible with libvirt's >> support for save image compression. Also note the mapped-ram stream >> is incompatible with the existing stream format, hence mapped-ram >> cannot be used to restore an image saved with the existing format >> and vice versa. >> >> [1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapped-ram.rst?ref_type=heads >> >> Signed-off-by: Jim Fehlig <jfehlig@xxxxxxxx> >> --- >> src/qemu/qemu_driver.c | 20 ++++-- >> src/qemu/qemu_migration.c | 139 ++++++++++++++++++++++++++------------ >> src/qemu/qemu_migration.h | 4 +- >> src/qemu/qemu_monitor.c | 36 ++++++++++ >> src/qemu/qemu_monitor.h | 4 ++ >> src/qemu/qemu_saveimage.c | 43 +++++++++--- >> src/qemu/qemu_saveimage.h | 2 + >> src/qemu/qemu_snapshot.c | 9 ++- >> 8 files changed, 195 insertions(+), 62 deletions(-) >> > > > >> diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c >> index 6f2ce40124..98a1ad638d 100644 >> --- a/src/qemu/qemu_saveimage.c >> +++ b/src/qemu/qemu_saveimage.c >> @@ -96,6 +96,7 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(virQEMUSaveData, virQEMUSaveDataFree); >> */ >> virQEMUSaveData * >> virQEMUSaveDataNew(virQEMUDriver *driver, >> + virDomainObj *vm, >> char *domXML, >> qemuDomainSaveCookie *cookieObj, >> bool running, >> @@ -115,6 +116,19 @@ virQEMUSaveDataNew(virQEMUDriver *driver, >> header = &data->header; >> memcpy(header->magic, QEMU_SAVE_PARTIAL, sizeof(header->magic)); >> header->version = cfg->saveImageVersion; >> + >> + /* Enable mapped-ram feature if available and save version >= 3 */ >> + if (header->version >= QEMU_SAVE_VERSION && >> + qemuMigrationCapsGet(vm, QEMU_MIGRATION_CAP_MAPPED_RAM)) { >> + if (compressed != QEMU_SAVE_FORMAT_RAW) { >> + virReportError(VIR_ERR_OPERATION_FAILED, >> + _("compression is not supported with save image version %1$u"), >> + header->version); >> + goto error; >> + } >> + header->features |= QEMU_SAVE_FEATURE_MAPPED_RAM; >> + } > > If the QEMU we're usnig doesnt have CAP_MAPPED_RAM, then I think > we should NOT default to Version 3 save images, as that's creating > a backcompat problem for zero user benefit. > > This suggests that in qemu_conf.c, we should initialize the > default value to '0', and then in this code, if we see > version 0 we should pick either 2 or 3 depending on mapped > ram. > >> + >> header->was_running = running ? 1 : 0; >> header->compressed = compressed; >> > > With regards, > Daniel