Re: [PATCH 0/9] Allow sparse streams for block devices

Daniel P. Berrangé <berrange@xxxxxxxxxx> · Fri, 3 Jul 2020 12:53:32 +0100

On Fri, Jul 03, 2020 at 01:50:22PM +0200, Michal Privoznik wrote:
> On 7/3/20 1:34 PM, Daniel P. Berrangé wrote:
> > On Fri, Jul 03, 2020 at 12:28:41PM +0200, Michal Privoznik wrote:
> > > The way our sparse streams are implemented is:
> > > 
> > > 1) user facing APIs (virStreamSparseRecvAll() and virStreamSparseSendAll()) take
> > >     callbacks as arguments. These callbacks read/write data or determine if there
> > >     is a hole in the underlying file and big it is.
> > > 
> > > 2) libvirtd has something similar - virFDStream, except here the functions for
> > >     read/write of data and hole handling are called directly.
> > > 
> > > Sparse streams were originally implemented for regular files only => both ends
> > > of stream has to be regular files. This limitation exists because the callbacks
> > > from 1) (implemented in virsh for vol-download/vol-upload commands) and also
> > > from 2) (which is basically the same code) uses lseek(..., SEEK_DATA) and/or
> > > lseek(..., SEEK_HOLE) to get map of allocated file blocks. They also take a
> > > shortcut (valid for regular files) - when one side of the stream is asked to
> > > create a hole it merely lseek() + ftruncate(). For regular files this creates a
> > > hole and later, when somebody reads it all they get is zeroes.
> > > 
> > > Neither of these two approaches work for block devices. Block devices have no
> > > notion of data/hole sections [1], nor can they be truncated. What we can do
> > > instead is read data from the block device and check if its full of zeroes. And
> > > for "creating a hole" we just write zeroes of requested size.
> > > 
> > > There is a follow up patch that I am working on: this implementation I'm
> > > posting here has one disadvantage: after some blocks are read from the
> > > block device and they are found to contain data, the whole buffer is
> > > freed only to be read again. For instance, when uploading volume,
> > > virshStreamInData() is called at the beginning to check if the file
> > > containing data to upload doesn't start with a hole. If the file is a
> > > block device, then virFileInDataDetectZeroes() is called which reads
> > > 1MiB from it, finds (say) data and throws the buffer away. Then
> > > virshStreamSource() is called, which reads the 1MiB buffer again.
> > > The patch is still under development though.
> > 
> > Was there a particular user/app feature request for this support ?
> 
> Yes, see BZ linked in 9/9.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1852528
> 
> Apparently, VDSM uses streams to copy volumes around.

Ah I see. We might want to ask them if they would find it useful if we
added an NBD alternative.

> > I'm wondering about the likely use case, because if I was starting
> > over from scratch I'd never implement stream support for storage
> > volumes. Instead I would add APIs for starting/stopping a qemu-nbd
> > server attached to the volume. Probably don't even need a start/
> > stop pair, could just run in single client mode where we pass back
> > an opened client FD, and have qemu-nbd exit when this is closed.
> > 
> > Depending on the user requesting sparse support for blockdevs it
> > may still make sense to provide them an NBD solution, especially
> > if they're likely to have followup feature requests already handled
> > by NBD.
> 
> Agreed, streams should have been for console and screenshot only. They are
> strictly worse for handling large files than scp/nbd/rsync/... because it
> all has to go through our event loop. And client even loop.
> On the other hand, they are multiplexed within virCommand which is an
> advantage that neither of the aforementioned tools have (no need to open a
> special port then).

If qemu-nbd listen on a private UNIX socket, and we pass back a FD connected
to that socket, there's no need to deal with firewalls or permissions.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|