Re: [libvirt RFCv11 00/33] multifd save restore prototype

Daniel P. Berrangé <berrange@xxxxxxxxxx> · Wed, 11 Oct 2023 15:05:22 +0100

On Wed, Oct 11, 2023 at 03:46:59PM +0200, Claudio Fontana wrote:
> In terms of our use case, we would need to trigger these migrations from virsh save, restore, managedsave / start.
> 
> 1) Can you confirm this is still a good target?

IIRC the 'dump' command also has a codepath that can exercise
the migrate-to-file logic too.

> It would seem right from my perspective to hook up save/restore first, and then reuse the same mechanism for managedsave / start.

All of save, restore, managedsave, start, dump end up calling
into the same internal helper methods. So once you update these
helpers, you essentially get all the commands converted in one
go.

> 2) Do we expect to pass filename or file descriptor from libvirt into QEMU?
> 
> 
> As is, libvirt today generally passes an already opened file descriptor to QEMU for migrations, roughly:
> 
> {"execute": "getfd", "arguments": {"fdname":"migrate"}} (passing the already open fd from libvirt f.e. 10)
> {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"}}'
> 
> Do we want to change libvirt to migrate to a file: URI ? Does this have consequence for "labeling" / security sandboxing?
> 
> Or would it be better to continue opening the fd in libvirt, writing the libvirt header, and then passing the existing open fd to QEMU, using QMP command "getfd",
> followed by "migrate"? In this second case we would need to inform QEMU of the offset into the already open fd.

How about both :-)

The current migration 'fd' protocol technically can cope with
any type of FD being passed on. QEMU doesn't try to interpret
the FD type right to any significant degree.

The 'file' protocol is explicitly providing a migration transport
supporting random access I/O to storage. As such we can specify
the offset too.

Now the neat trick is that 'file' protocol impl uses
qio_channel_file and this in turn uses qemu_open,
which supports FD passing.

Instead of using 'getfd' though we have to use 'add-fd'.

Anyway, this lets us do FD passing as normal, whle also
letting us specify the offset.

 {"execute": "add-fd", "arguments": {"fdset-id":"migrate"}}
 {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}'

> Internally, the QEMU multifd code just reads and writes using pread, pwrite, so there is in any case just one fd to worry about,
> but who should own it, libvirt or QEMU?

How about both :-)

Libvirt will open the file, in order to write its header.
Then libvirt passes the open FD to QEMU, specifying the
offset, and QEMU does its thing with vmstate, etc and
closes the FD when its done. libvirt's copy of the FD
is still open, and libvirt can finalize its header and
close the FD.

> 3) How do we deal with O_DIRECT? In the prototype we were setting the O_DIRECT on the fd from libvirt in response to the user request for --bypass-cache,
> which is needed 99% of the time with large VMs. I think I remember that we plan to write from libvirt normally (without O_DIRECT) and then set the flag later,
> but should libvirt or QEMU set the O_DIRECT flag? This likely depends on who owns the fd?

For O_DIRECT, the 'file' protocol should gain a new parameter
'bypass_cache: bool'. If this is set to 'true' then QEMU can
set O_DIRECT on the FD it opens or receives from libvirt.

Libvirt probably just has to be careful to unset O_DIRECT
at the end before it finalizes the header.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|