On Mon, 2016-02-01 at 14:50 +0000, Daniel P. Berrange wrote: > On Fri, Jan 29, 2016 at 02:26:51PM +0100, Michal Privoznik wrote: > > ** NOT TO BE MERGED UPSTREAM ** > > > > This is merely an RFC. > > > > What's the problem? > > We have APIs for transferring disk images from/to host. Problem is, > > disk images > > can be sparse files. Our code is, however, unaware of that fact so > > if for > > instance the disk is one big hole of 8GB all those bytes have to: > > a) be read b) > > come through our event loop. This is obviously very inefficient > > way. > > > > How to deal with it? > > The way I propose (and this is actually what I like you to comment > > on) is to > > invent set of new API. We need to make read from and write to a > > stream > > sparseness aware. The new APIs are as follows: > > > > int virStreamSendOffset(virStreamPtr stream, > > unsigned long long offset, > > const char *data, > > size_t nbytes); > > > > int virStreamRecvOffset(virStreamPtr stream, > > unsigned long long *offset, > > char *data, > > size_t nbytes); > > > > The SendOffset() is fairly simple. It is given an offset to write > > @data from so > > it basically lseek() to @offset and write() data. > > The RecvOffset() has slightly complicated design - it has to be > > aware of the > > fact that @offset it is required to read from fall into a hole. If > > that's the > > case it sets @offset to new location where data starts. > > > > Are there other ways possible? > > Sure! If you have any specific in mind I am happy to discuss it. > > For instance > > one idea I've heard (from Martin) was instead of SendOffset() and > > RecvOffset() > > we may just invent our variant of lseek(). > > > > What's left to be done? > > Basically, I still don't have RPC implementation. But before I dive > > into that I > > thought of sharing my approach with you - because it may turn out > > that a > > different approach is going to be needed and thus my work would > > render useless. > > It would be intesting to see the virsh vol-upload/download client > code updated > to use the new APIs to deal with holes, so we can see how this API > design looks > from the POV of apps using libvirt. > > > Regards, > Daniel Still an RFC. Below I've included example of updated virsh vol-download client code to handle sparse files/streams as mentioned in earlier mail. This example assumes a different take on the stream APIs than discussed previously. For this exercise, let's assume that we don't change the function profiles of the existing virStream APIs; rather their behavior will be slightly different for a "sparse stream". A client will need a way to tell libvirt that it wants/expects the sparse behavior. To that purpose let's define a couple of new API's: int virStreamSetSparse(virStreamPtr stream) requests that sparse behavior be enabled for this stream. Returns 0 on success; -1 on error. Possible errors: maybe don't support enabling sparseness after data already sent/received on stream? Note: a client could request sparseness via new flag to virStreamNew() as well or instead. Will also want a: int virStreamIsSparse(virStreamPtr stream) for use inside stream driver, unless open coding a flags test is preferred? Existing Streams [and FDStreams] APIs that behave differently for sparse streams: virStreamSend(virStreamPtr stream, const char *data, size_t nbytes) will recognize a NULL @data as indicating that @nbytes contains the size of a hole in the data source rather than an error. This will be true also for stream driver streamSend() handlers: remoteStreamSend() and virFDStreamWrite() which take the same arguments as virStreamSend(). Internally when called from remoteStreamSend() with status=VIR_NET_SKIP virNetClientStreamSendPacket() will encode the hole length nbytes as the payload of a VIR_STREAM_SKIP packet as Daniel suggested earlier. virStreamRecv(virStreamPtr stream, char *data, size_t nbytes), upon encountering a hole in the incoming data stream will return the negative value of the hole size. Similarly for the stream driver streamRecv handlers remoteStreamRecv() and virFDStreamRead(). All of these functions current return bytes received [>0], EOF [== 0], or -1 or -2 to indicate error or no data on a non-blocking stream. The minimum hole size will be defined [much?] greater than 2, so a negative return value < (0 - VIR_STREAM_MIN_HOLE_SIZE) can indicate a hole. A macro to test the return value for a hole would be convenient. Prior mail discussed how a VIR_STREAM_SKIP could be propagated up the stack as an iovec with NULL iov_base to represent a hole. virNetClientStreamRecvPacket() would the return the hole as the large negative value in this model. One limitation with this approach is that the return values are declared as 'int' which will limit the size of the hole we can receive. This will require that multi-GB holes be broken up into ~2GB chunks. If this [or the use of large negative returns to represent holes] sounds too onerous, one can adopt an approach with new sparse stream APIs with an explicit @holesize out parameter for receive. I don't think this will affect the lower level implementation nor virsh's vol{Up,Down}load commands all that much. Next, virsh's cmdVol{Upload,Download)() functions use virStream{Send,Recv}All(), respectively. These functions take a stream, a data source or sink handler function and an opaque object representing the actual source or sink. virsh need only request sparse stream behavior and pass a "sparse stream aware" source or sink handler and object to use the existing APIs. As a convenience, libvirt can provide support for sparse aware source/sink handlers layered on FDStreams as follows: typedef void *virStreamSparseFilePtr; virStreamSparseFilePtr virStreamSparseFileOpen(const char *path, unsigned long long offset, unsigned long long length, int oflags) This is a public wrapper around the libvirt private virFDStreamOpenFile() et al that creates an FDStream associated with the specified file/path for use with sparse aware source/sink handlers. Open with O_RDONLY for sources, O_WRONLY+O_TRUNC... for sinks. I don't know that offset and length are needed for source and sink, but I kept them above. We'll also need a: int virStreamSparseFileClose(virStreamSparseFilePtr sparseFile) to clean things up. Libvirt can also provide [needs to while FDStreams are libvirt private] sparse aware source and sink handlers: int virStreamSparseSource(virStreamPtr st [ATTRIBUTE_UNUSED ? TBD], char *bytes, size_t nbytes, void *opaque) This is a public wrapper around virFDStreamRead() that knows that the @opaque parameter is a sparse FDStream created by virStreamSparseFileOpen(). It will pass along @bytes and @nbytes to the sparse FDStream which may return holes as large negative values. int virStreamSparseSink(virStreamPtr st [ATTRIBUTE_UNUSED ? TBD], const char *bytes, size_t nbytes, void *opaque) A public wrapper around virFDStreamWrite() that passes along its arguments to the FDStream indicated by @opaque. One more thing before looking at the virsh cmdVol{Up,Down}load() code: we may want to maintain the current, non-sparse behavior as default and require a '--sparse' option to enable sparse behavior. The following code segments represent the sparse path after parameter parsing. cmdVolDownload() sparse file/stream path: ! virStreamSparseFilePtr sparseFile; ! if ((sparseFile = virStreamSparseFileOpen(file, ! O_WRONLY|O_CREAT|O_EXCL, ! 0666)) < 0) { if (errno != EEXIST || ! (sparseFile = virStreamSparseFileOpen(file, ! O_WRONLY|O_TRUNC, ! 0666)) < 0) { vshError(ctl, _("cannot create %s"), file); goto cleanup; } } else { created = true; } if (!(st = virStreamNew(priv->conn, 0))) { vshError(ctl, _("cannot create a new stream")); goto cleanup; } if (virStorageVolDownload(vol, st, offset, length, 0) < 0) { vshError(ctl, _("cannot download from volume %s"), name); goto cleanup; } ! if (virStreamRecvAll(st, virStreamSparseSink, sparseFile) < 0) { vshError(ctl, _("cannot receive data from volume %s"), name); goto cleanup; } ! if (virStreamSparseFileClose(sparseFile) < 0) { vshError(ctl, _("cannot close file %s"), file); virStreamAbort(st); goto cleanup; } if (virStreamFinish(st) < 0) { vshError(ctl, _("cannot close volume %s"), name); goto cleanup; } ret = true; cleanup: ! // cleanup will be different for sparseFile upload, too. Changes to cmdVolUpload() will mirror those above with a data source opened O_RDONLY. I hope this is sufficient detail to explain this model -- reusing the existing with a couple of additions for sparse streams. I've started generating patches, but would like to settle on the APIs before going to far at that level. Changes to the stream driver functions will be fairly straightforward if we stick to sending and receiving holes explicitly, whatever the API. Handling hole detection and expansion for sources and sinks that don't support seeking() and when the remote end doesn't support the new protocol may contain some interesting issues. I'll be out of the office for a couple of weeks and so won't be able to work on this much during that time. Perhaps sparse stream support will be a done deal by the time I return. Regards, Lee -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list