On Wed, Oct 11, 2023 at 04:56:12PM +0200, Claudio Fontana wrote: > > On 10/11/23 16:05, Daniel P. Berrangé wrote: > > > > Instead of using 'getfd' though we have to use 'add-fd'. > > > > Anyway, this lets us do FD passing as normal, whle also > > letting us specify the offset. > > > > {"execute": "add-fd", "arguments": {"fdset-id":"migrate"}} > > {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}' > > > >> Internally, the QEMU multifd code just reads and writes using pread, pwrite, so there is in any case just one fd to worry about, > >> but who should own it, libvirt or QEMU? > > > > How about both :-) > > I need to familiarize a bit with this, there are pieces I am missing. Can you correct here? > > OPTION 1) > > libvirt opens the file and has the FD, writes the header, marks the offset, > then we dup the FD in libvirt for the benefit of QEMU, optionally set the flags of the dup to "O_DIRECT" (the usual case) depending on --bypass-cache, > pass the duped FD to QEMU, > QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the file:// URI optional offset parameter), > then libvirt closes the duped fd > libvirt rewrites the header using the original fd (needed to update the metadata), > libvirt closes the original fd > > > OPTION 2) > > libvirt opens the file and has the FD, writes the header, marks the offset, > then we pass the FD to QEMU, > QEMU dups the FD and sets it as "O_DIRECT" depending on a passed parameter, > QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the file:// URI optional offset parameter), > QEMU closes the duped FD, > libvirt rewrites the header using the original fd (needed to update the metadata), > libvirt closes the original fd > > > I don't remember if QEMU changes for the file offsets optimization are already "block friendly" ie they operate correctly whatever the state of O_DIRECT or ~O_DIRECT, > I think so. They have been thought with O_DIRECT in mind. The 'file' protocol as it exists currently is not O_DIRECT capable. It is not writing aligned buffers to aligned offsets in the file. It is still running the regular old migration stream format over the file, not taking advantage of it being random access. What's needed is the followup "fixed ram" format adaptation. Use of that format should imply O_DIRECT, so in fact we don't need an explicit 'bypass_cache' parameter in QAPI, just a way to ask for the 'fixed ram' format. > So I would tend to see OPTION 1) as more attractive as QEMU does not need to care about another parameter, whatever has been chosen in libvirt in terms of bypass cache is handled in libvirt. The 'fixed ram' format will only take care of I/O for the main RAM blocks which are nicely aligned and can be written to aligned file offsets. The general device vmstate I/O probably can't be assumed to be aligned. While we could futz around with QEMUFile so that it bounce buffers vmstate to an aligned region and flushes it in page sized chunks that's probably too much of a pain. IOW, actually I think what QEMU would likely want to do is 1. qemu_open -> get a FD *without* O_DIRECT set 2. write some vmstate stuff 3. turn on O_DIRECT 4. write RAM in fixed locations 5. turn off O_DIRECT 6. write remaining vmstate With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|