On Fri, Jul 03, 2020 at 01:50:22PM +0200, Michal Privoznik wrote: > On 7/3/20 1:34 PM, Daniel P. Berrangé wrote: > > On Fri, Jul 03, 2020 at 12:28:41PM +0200, Michal Privoznik wrote: > > > The way our sparse streams are implemented is: > > > > > > 1) user facing APIs (virStreamSparseRecvAll() and virStreamSparseSendAll()) take > > > callbacks as arguments. These callbacks read/write data or determine if there > > > is a hole in the underlying file and big it is. > > > > > > 2) libvirtd has something similar - virFDStream, except here the functions for > > > read/write of data and hole handling are called directly. > > > > > > Sparse streams were originally implemented for regular files only => both ends > > > of stream has to be regular files. This limitation exists because the callbacks > > > from 1) (implemented in virsh for vol-download/vol-upload commands) and also > > > from 2) (which is basically the same code) uses lseek(..., SEEK_DATA) and/or > > > lseek(..., SEEK_HOLE) to get map of allocated file blocks. They also take a > > > shortcut (valid for regular files) - when one side of the stream is asked to > > > create a hole it merely lseek() + ftruncate(). For regular files this creates a > > > hole and later, when somebody reads it all they get is zeroes. > > > > > > Neither of these two approaches work for block devices. Block devices have no > > > notion of data/hole sections [1], nor can they be truncated. What we can do > > > instead is read data from the block device and check if its full of zeroes. And > > > for "creating a hole" we just write zeroes of requested size. > > > > > > There is a follow up patch that I am working on: this implementation I'm > > > posting here has one disadvantage: after some blocks are read from the > > > block device and they are found to contain data, the whole buffer is > > > freed only to be read again. For instance, when uploading volume, > > > virshStreamInData() is called at the beginning to check if the file > > > containing data to upload doesn't start with a hole. If the file is a > > > block device, then virFileInDataDetectZeroes() is called which reads > > > 1MiB from it, finds (say) data and throws the buffer away. Then > > > virshStreamSource() is called, which reads the 1MiB buffer again. > > > The patch is still under development though. > > > > Was there a particular user/app feature request for this support ? > > Yes, see BZ linked in 9/9. > > https://bugzilla.redhat.com/show_bug.cgi?id=1852528 > > Apparently, VDSM uses streams to copy volumes around. Ah I see. We might want to ask them if they would find it useful if we added an NBD alternative. > > I'm wondering about the likely use case, because if I was starting > > over from scratch I'd never implement stream support for storage > > volumes. Instead I would add APIs for starting/stopping a qemu-nbd > > server attached to the volume. Probably don't even need a start/ > > stop pair, could just run in single client mode where we pass back > > an opened client FD, and have qemu-nbd exit when this is closed. > > > > Depending on the user requesting sparse support for blockdevs it > > may still make sense to provide them an NBD solution, especially > > if they're likely to have followup feature requests already handled > > by NBD. > > Agreed, streams should have been for console and screenshot only. They are > strictly worse for handling large files than scp/nbd/rsync/... because it > all has to go through our event loop. And client even loop. > On the other hand, they are multiplexed within virCommand which is an > advantage that neither of the aforementioned tools have (no need to open a > special port then). If qemu-nbd listen on a private UNIX socket, and we pass back a FD connected to that socket, there's no need to deal with firewalls or permissions. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|