On Wed, May 11, 2022 at 01:52:05PM +0200, Claudio Fontana wrote: > On 5/11/22 11:51 AM, Daniel P. Berrangé wrote: > > On Wed, May 11, 2022 at 09:26:10AM +0200, Claudio Fontana wrote: > >> Hi Daniel, > >> > >> thanks for looking at this, > >> > >> On 5/10/22 8:38 PM, Daniel P. Berrangé wrote: > >>> On Sat, May 07, 2022 at 03:42:53PM +0200, Claudio Fontana wrote: > >>>> This is v8 of the multifd save prototype, which fixes a few bugs, > >>>> adds a few more code splits, and records the number of channels > >>>> as well as the compression algorithm, so the restore command is > >>>> more user-friendly. > >>>> > >>>> It is now possible to just say: > >>>> > >>>> virsh save mydomain /mnt/saves/mysave --parallel > >>>> > >>>> virsh restore /mnt/saves/mysave --parallel > >>>> > >>>> and things work with the default of 2 channels, no compression. > >>>> > >>>> It is also possible to say of course: > >>>> > >>>> virsh save mydomain /mnt/saves/mysave --parallel > >>>> --parallel-connections 16 --parallel-compression zstd > >>>> > >>>> virsh restore /mnt/saves/mysave --parallel > >>>> > >>>> and things also work fine, due to channels and compression > >>>> being stored in the main save file. > >>> > >>> For the sake of people following along, the above commands will > >>> result in creation of multiple files > >>> > >>> /mnt/saves/mysave > >>> /mnt/saves/mysave.0 > >> > >> just minor correction, there is no .0 > > > > Heh, off-by-1 > > > >> > >>> /mnt/saves/mysave.1 > >>> .... > >>> /mnt/saves/mysave.n > >>> > >>> Where 'n' is the number of threads used. > >>> > >>> Overall I'm not very happy with the approach of doing any of this > >>> on the libvirt side. > >> > >> > >> Ok I understand your concern. > >> > >>> > >>> Backing up, we know that QEMU can directly save to disk faster than > >>> libvirt can. We mitigated alot of that overhead with previous patches > >>> to increase the pipe buffer size, but some still remains due to the > >>> extra copies inherant in handing this off to libvirt. > >> > >> Right; > >> still the performance we get is insufficient for the use case we are trying to address, > >> even without libvirt in the picture. > >> > >> Instead, with parallel save + compression we can make the numbers add up. > >> For parallel save using multifd, the overhead of libvirt is negligible. > >> > >>> > >>> Using multifd on the libvirt side, IIUC, gets us better performance > >>> than QEMU can manage if doing non-multifd write to file directly, > >>> but we still have the extra copies in there due to the hand off > >>> to libvirt. If QEMU were to be directly capable to writing to > >>> disk with multifd, it should beat us again. > >> > >> Hmm I am thinking about this point, and at first glance I don't > >> think this is 100% accurate; > >> > >> if we do parallel save like in this series with multifd, > >> the overhead of libvirt is almost non-existent in my view > >> compared with doing it with qemu only, skipping libvirt, > >> it is limited to the one iohelper for the main channel > >> (which is the smallest of the transfers), > >> and maybe this could be removed as well. > > > > Libvirt adds overhead due to the multiple data copies in > > the save process. Using multifd doesn't get rid of this > > overhead, it merely distributes the overhead across many > > CPUs. The overall wallclock time is reduced but in aggregate > > the CPUs still have the same amount of total work todo > > copying data around. > > > > I don't recall the scale of the libvirt overhead that remains > > after the pipe buffer optimizations, but whatever is less is > > still taking up host CPU time that can be used for other guests. > > > > It also just ocurred to me that currently our save/restore > > approach is bypassing all resource limits applied to the > > guest. eg block I/O rate limits, CPU affinity controls, > > etc, because most of the work is done in the iohelper. > > If we had this done in QEMU, then the save/restore process > > is confined by the existing CPU affinity / I/o limits > > applied to the guest. This mean we would not negatively > > impact other co-hosted guests to the same extent. > > > >> This is because even without libvirt in the picture, we > >> are still migrating to a socket, and something needs to > >> transfer data from that socket to a file. At that point > >> I think both libvirt and a custom made script are in the > >> same position. > > > > If QEMU had explicit support for a "file" backend, there > > would be no socket involved at all. QEMU would be copying > > guest RAM directly to a file with no intermediate steps. > > If QEMU mmap'd the save state file, then saving of the > > guest RAM could even possibly reduce to a mere 'memcpy()' > > Agree, but still, to align with your requirement to have only one file, > libvirt would need to add some padding after the libvirt header and before the QEMU VM starts in the file, > so that the QEMU VM starts at a block-friendly address. That's trivial, as we already add padding in this place. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|