On 3/26/22 4:49 PM, Claudio Fontana wrote: > On 3/25/22 12:29 PM, Daniel P. Berrangé wrote: >> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote: >>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote: >>>> * Claudio Fontana (cfontana@xxxxxxx) wrote: >>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote: >>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote: >>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote: >>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote: >>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote: >>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana wrote: >>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote: >>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote: >>>>>>>>>>>>> the first user is the qemu driver, >>>>>>>>>>>>> >>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default pipe size (64k). >>>>>>>>>>>>> >>>>>>>>>>>>> This improves the situation by 400%. >>>>>>>>>>>>> >>>>>>>>>>>>> Going through io_helper still seems to incur in some penalty (~15%-ish) >>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a file. >>>>>>>>>>>>> >>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfontana@xxxxxxx> >>>>>>>>>>>>> --- >>>>>>>>>>>>> src/qemu/qemu_driver.c | 6 +++--- >>>>>>>>>>>>> src/qemu/qemu_saveimage.c | 11 ++++++----- >>>>>>>>>>>>> src/util/virfile.c | 12 ++++++++++++ >>>>>>>>>>>>> src/util/virfile.h | 1 + >>>>>>>>>>>>> 4 files changed, 22 insertions(+), 8 deletions(-) >>>>>>>>>>>>> >>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance issue, >>>>>>>>>>>>> so you can find the discussion about this in qemu-devel: >>>>>>>>>>>>> >>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)" >>>>>>>>>>>>> >>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html >>>>>>> >>>>>>> >>>>>>>> Current results show these experimental averages maximum throughput >>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per QEMU QMP >>>>>>>> "query-migrate", tests repeated 5 times for each). >>>>>>>> VM Size is 60G, most of the memory effectively touched before migration, >>>>>>>> through user application allocating and touching all memory with >>>>>>>> pseudorandom data. >>>>>>>> >>>>>>>> 64K: 5200 Mbps (current situation) >>>>>>>> 128K: 5800 Mbps >>>>>>>> 256K: 20900 Mbps >>>>>>>> 512K: 21600 Mbps >>>>>>>> 1M: 22800 Mbps >>>>>>>> 2M: 22800 Mbps >>>>>>>> 4M: 22400 Mbps >>>>>>>> 8M: 22500 Mbps >>>>>>>> 16M: 22800 Mbps >>>>>>>> 32M: 22900 Mbps >>>>>>>> 64M: 22900 Mbps >>>>>>>> 128M: 22800 Mbps >>>>>>>> >>>>>>>> This above is the throughput out of patched libvirt with multiple Pipe Sizes for the FDWrapper. >>>>>>> >>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest that >>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but >>>>>>> not try to go higher. >>>>>>> >>>>>>>> As for the theoretical limit for the libvirt architecture, >>>>>>>> I ran a qemu migration directly issuing the appropriate QMP >>>>>>>> commands, setting the same migration parameters as per libvirt, >>>>>>>> and then migrating to a socket netcatted to /dev/null via >>>>>>>> {"execute": "migrate", "arguments": { "uri", "unix:///tmp/netcat.sock" } } : >>>>>>>> >>>>>>>> QMP: 37000 Mbps >>>>>>> >>>>>>>> So although the Pipe size improves things (in particular the >>>>>>>> large jump is for the 256K size, although 1M seems a very good value), >>>>>>>> there is still a second bottleneck in there somewhere that >>>>>>>> accounts for a loss of ~14200 Mbps in throughput. >>>>> >>>>> >>>>> Interesting addition: I tested quickly on a system with faster cpus and larger VM sizes, up to 200GB, >>>>> and the difference in throughput libvirt vs qemu is basically the same ~14500 Mbps. >>>>> >>>>> ~50000 mbps qemu to netcat socket to /dev/null >>>>> ~35500 mbps virsh save to /dev/null >>>>> >>>>> Seems it is not proportional to cpu speed by the looks of it (not a totally fair comparison because the VM sizes are different). >>>> >>>> It might be closer to RAM or cache bandwidth limited though; for an extra copy. >>> >>> I was thinking about sendfile(2) in iohelper, but that probably >>> can't work as the input fd is a socket, I am getting EINVAL. >> >> Yep, sendfile() requires the input to be a mmapable FD, >> and the output to be a socket. >> >> Try splice() instead which merely requires 1 end to be a >> pipe, and the other end can be any FD afaik. >> >> With regards, >> Daniel >> > > I did try splice(), but performance is worse by around 500%. > > It also fails with EINVAL when trying to use it in combination with O_DIRECT. > > Tried larger and smaller buffers, flags like SPLICE_F_MORE an SPLICE_F_MOVE in any combination; no change, just awful performance. Ok I found a case where splice actually helps: in the read case, without O_DIRECT, splice seems to actually outperform read/write by _a lot_. I will code up the patch and start making more experiments with larger VM sizes etc. Thanks! Claudio > > Here is the code: > > #ifdef __linux__ > +static ssize_t safesplice(int fdin, int fdout, size_t todo) > +{ > + unsigned int flags = SPLICE_F_MOVE | SPLICE_F_MORE; > + ssize_t ncopied = 0; > + > + while (todo > 0) { > + ssize_t r = splice(fdin, NULL, fdout, NULL, todo, flags); > + if (r < 0 && errno == EINTR) > + continue; > + if (r < 0) > + return r; > + if (r == 0) > + return ncopied; > + todo -= r; > + ncopied += r; > + } > + return ncopied; > +} > + > +static ssize_t runIOCopy(const struct runIOParams p) > +{ > + size_t len = 1024 * 1024; > + ssize_t total = 0; > + > + while (1) { > + ssize_t got = safesplice(p.fdin, p.fdout, len); > + if (got < 0) > + return -1; > + if (got == 0) > + break; > + > + total += got; > + > + /* handle last write truncate in direct case */ > + if (got < len && p.isDirect && p.isWrite && !p.isBlockDev) { > + if (ftruncate(p.fdout, total) < 0) { > + return -4; > + } > + break; > + } > + } > + return total; > +} > + > +#endif > > > Any ideas welcome, > > Claudio >