On 04/28/2016 06:04 AM, Michal Privoznik wrote: > So, after couple of sleepless nights and headaches I'm proud to > announce that finally got this working. > > > What? > Our regular streams that are can be used to transfer disk images > for domains are unaware of any sparseness. Therefore they have > two limitations: > > a) transferring big but sparse image can take ages as all the > holes (interpreted by kernel as '\0') have to go through our > event loop. > b) resulting volume is not sparse even if the source was. > > > How? > I went by verified approach that linux kernel has. One way to > look at our streams is just like read() and write() with a > different names: virStreamRecv() and virStreamSend(). They even > have the same prototype (if 'int fd' is substituted with > 'virStreamPtr'). Now, holes in files are created and detected via > third API: lseek(). Therefore I'm introducing new virStreamSkip() > API that mimics the missing primitive. Now, because our streams > do not necessarily have to work over files (they are for generic > data transfer), I had to let users register a callback that is > called whenever the other side calls virStreamSkip(). > > So now that we have all three primitives, we can focus on making > life easier for our users. Nobody is actually using bare > virStreamSend() and virStreamRecv() rather than our wrappers: > virStreamSendAll() and virStreamRecvAll(). With my approach > described above just virStreamSendAll() needs to be adjusted so > that it's 'sparse file' aware. The virStreamRecvAll() will only > get the data to write (just like it is now) with skip callback > called automatically whenever needed. In order for > virStreamSendAll() to skip holes I'm introducing yet another > callback: virStreamInDataFunc(). This callback will help us to > create a map of a file: before each virStreamSend() it checks > whether we are in a data section or a hole and calls > virStreamSend() or virStreamSkip() respectively. > > Do not worry - it will all become clear once you see the code. > > Now question is - how will users enable this feature? I mean, we > have take into account that we might be talking to an older > daemon that does not know how to skip a hole. Or vice versa - > older client. > The solution I came up with is to introduce flags to APIs where > sparse streams make sense. I guess it makes sense for volume > upload and download, but almost certainly makes no sense for > virDomainOpenConsole(). > > > Code? >>From users POV they just need to pass correct argument to > 'vol-upload' or 'vol-download' virsh commands. One layer down, on > programming level they need to merely: > > st = virStreamNew(conn, 0); > virStreamRegisterSkip(st, skipFunc, &fd); > virStorageVolDownload(st, ...); > virStreamRecvAll(st, sinkFunc, &fd); > > where: > > int skipFunc(virStreamPtr st, > unsigned long long offset, void *opaque) > { > int *fd = opaque; > return lseek(*fd, offset, SEEK_CUR) == (off_t) -1 ? -1 : 0; > } > > And for uploading it's slightly more verbose - see patch 24. > > > Limitations? > While testing this on my machine with XFS, I've noticed that the > resulting map of a transferred file is not exactly the same as > the source's. Checksums are the same though. After digging deeper > I found this commit in the kernel: > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=055388a3188f56676c21e92962fc366ac8b5cb72 > > Thing is, as we transfer the file, we are practically just > seeking at EOF and thus creating holes. But if the hole size is > small enough, XFS will use some speculative file allocation > algorithm and eventually fully allocate the blocks even if we > intended to create a hole. This does not occur when punching a > hole into a file though. Well, I guess XFS devels have some > reasons to do that. > > This behaviour has not been observed on EXT4. > > > Notes? > Oh, patches 01-03 have been ACKed already, but are not pushed yet > because of the freeze. But since this feature build on the top of > them, I'm sending them too. > > Also the whole patch set is accessible at my github: > > https://github.com/zippy2/libvirt/tree/sparse_streams4 > > > Michal Privoznik (27): > Revert "rpc: Fix slow volume download (virsh vol-download)" > virnetclientstream: Process stream messages later > virStream{Recv,Send}All: Increase client buffer > Introduce virStreamSkip > Introduce virStreamRegisterSkip and virStreamSkipCallback > Introduce virStreamInData and virStreamRegisterInData > virNetClientStreamNew: Track origin stream > Track if stream is seekable > RPC: Introduce virNetStreamSkip > Introduce VIR_NET_STREAM_SKIP message type > Teach wireshark plugin about VIR_NET_STREAM_SKIP > daemon: Implement VIR_NET_STREAM_SKIP handling > daemon: Introduce virNetServerProgramSendStreamSkip > virnetclientstream: Introduce virNetClientStreamSendSkip > virnetclientstream: Introduce virNetClientStreamHandleSkip > remote_driver: Implement virStreamSkip > daemonStreamHandleRead: Wire up seekable stream > virNetClientStream: Wire up VIR_NET_STREAM_SKIP > virStreamSendAll: Wire up sparse streams > fdstream: Implement seek > gendispatch: Introduce @sparseflag for our calls > Introduce virStorageVol{Download,Upload}Flags > virsh: Implement sparse stream to vol-download > virsh: Implement sparse stream to vol-upload > fdstream: Suppress use of IO helper for sparse streams > daemon: Don't call virStreamInData so often > storage: Enable sparse streams for virStorageVol{Download,Upload} > > daemon/remote.c | 2 +- > daemon/stream.c | 134 +++++++++++++-- > daemon/stream.h | 3 +- > include/libvirt/libvirt-storage.h | 9 + > include/libvirt/libvirt-stream.h | 47 ++++++ > src/datatypes.h | 8 + > src/driver-stream.h | 5 + > src/fdstream.c | 156 +++++++++++++++--- > src/fdstream.h | 3 +- > src/libvirt-storage.c | 4 +- > src/libvirt-stream.c | 238 ++++++++++++++++++++++++++- > src/libvirt_internal.h | 7 + > src/libvirt_private.syms | 2 + > src/libvirt_public.syms | 7 + > src/libvirt_remote.syms | 2 + > src/remote/remote_driver.c | 41 ++++- > src/remote/remote_protocol.x | 2 + > src/rpc/gendispatch.pl | 21 ++- > src/rpc/virnetclient.c | 1 + > src/rpc/virnetclientstream.c | 308 ++++++++++++++++++++++++----------- > src/rpc/virnetclientstream.h | 10 +- > src/rpc/virnetprotocol.x | 16 +- > src/rpc/virnetserverprogram.c | 33 ++++ > src/rpc/virnetserverprogram.h | 7 + > src/storage/storage_backend.c | 12 +- > src/storage/storage_driver.c | 4 +- > src/virnetprotocol-structs | 4 + > tools/virsh-volume.c | 40 ++++- > tools/virsh.c | 79 +++++++++ > tools/virsh.h | 12 ++ > tools/virsh.pod | 6 +- > tools/wireshark/src/packet-libvirt.c | 40 +++++ > tools/wireshark/src/packet-libvirt.h | 2 + > 33 files changed, 1104 insertions(+), 161 deletions(-) > Read the code mostly for educational purposes, but also ran through Coverity... First the coverity notes: Patch 11: dissect_xdr_stream_skip() - "start" is not initialized Patch 15: virNetClientStreamHandleSkip() - if (!msg || ...) check then dereferences msg in the virReportError. So it seems !msg needs its own check and message. Couple of questions/thoughts about skipping in general. Is it possible to skip past some sort of EOF? IOW: What if skip 'offset' is larger than perceived maximum size of the file? I didn't dig line to line, but just a general thought along the lines of the either not so bright or even worse malicious code that now has a means to fill up a disk with a mostly empty file. In Patch 23, your commit message indicates enabling skips for specific backends - perhaps batter to note "local" vs "remote" type operations? For iSCSI, it's possible to configure a disk/lun such that the target file is found in the /dev/disk/by-path (mode="host"). John -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list