On 05/12/2017 12:56 PM, John Ferlan wrote: > > > On 05/12/2017 03:29 AM, Michal Privoznik wrote: >> On 05/05/2017 04:48 PM, Daniel P. Berrange wrote: >>> On Fri, May 05, 2017 at 01:25:34PM +0200, Michal Privoznik wrote: >>>> On 05/04/2017 11:29 PM, John Ferlan wrote: >>>>> >>>>> >>>>> On 04/20/2017 06:01 AM, Michal Privoznik wrote: >>>>>> This API can be used to tell the other side of the stream to skip >>>>> >>>>> s/can be/is (unless it can be used for something else ;-)) >>>>> >>>>>> some bytes in the stream. This can be used to create a sparse >>>>>> file on the receiving side of a stream. >>>>>> >>>>>> It takes just one argument @length, which says how big the hole >>>>>> is. Since our streams are not rewindable like regular files, we >>>>>> don't need @whence argument like seek(2) has. >>>>> >>>>> lseek is an implementation detail... However, it could be stated that >>>>> the skipping would be from the current point in the file forward by some >>>>> number of bytes. It's expected to be used in conjunction with code that >>>>> is copying over the real (or non-zero) data and should be considered an >>>>> optimization over sending zere data segments. >>>>> >>>>>> >>>>>> Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> >>>>>> --- >>>>>> include/libvirt/libvirt-stream.h | 3 +++ >>>>>> src/driver-stream.h | 5 ++++ >>>>>> src/libvirt-stream.c | 57 ++++++++++++++++++++++++++++++++++++++++ >>>>>> src/libvirt_public.syms | 1 + >>>>>> 4 files changed, 66 insertions(+) >>>>>> >>>>> >>>>> While it would be unused for now, should @flags be added. Who knows >>>>> what use it could have, but avoids a new Flags API, but does cause a few >>>>> other wording changes here. >>>> >>>> Ah sure. We should have @flags there. Good point. >>>> >>>>> >>>>> Perhaps it's just me - but "Skip" and "HoleSize" just seem awkward. >>>>> Would "virStreamSetSkip" and "virStreamGetSkip" be more apropos? (or >>>>> s/Skip/HoleSize/ - ewww). Names would then follow our more recent >>>>> function naming guidelines. I think I dislike the HoleSize much more >>>>> than the Skip. >>>> >>>> SetSkip and GetSkip sound wrong to me instead :D >>>> >>>>> >>>>>> diff --git a/include/libvirt/libvirt-stream.h b/include/libvirt/libvirt-stream.h >>>>>> index bee2516..4e0a599 100644 >>>>>> --- a/include/libvirt/libvirt-stream.h >>>>>> +++ b/include/libvirt/libvirt-stream.h >>>>>> @@ -50,6 +50,9 @@ int virStreamRecvFlags(virStreamPtr st, >>>>>> size_t nbytes, >>>>>> unsigned int flags); >>>>>> >>>>>> +int virStreamSkip(virStreamPtr st, >>>>>> + unsigned long long length); >>>>> >>>>> Was there consideration for using 'off_t' instead of ULL? I know it's an >>>>> implementation detail of virFDStreamData and lseek() usage, but it does >>>>> hide things... IDC either way. >>>> >>>> The problem with off_t is that it is signed type, while ULL is unsigned. >>>> There's not much point in sending a negative offset, is there? >>>> Moreover, we use ULL for arguments like offset (not sure really why). >>>> Frankly, I don't really know why. Perhaps some types don't exist everywhere? >>> >>> If anything, we would use size_t, for consistency with the Send/Recv >>> methods. >> >> So I've given this some though and ran some experiments. On a 32bit arch >> I've found this: >> >> long long 8 signed >> size_t 4 unsigned >> ssize_t 4 signed >> off_t 4 signed >> >> So size_t is 4 bytes long and long long is 8 bytes. This got me >> thinking, size_t type makes sense for those APIs where we need to >> address individual bytes. But what would happen if I have the following >> file on a 32 bit arch: >> >> [2MB data] -> [5GB hole] -> [2M data] > > Would a 5.4G file exist on a 32b arch since the OS couldn't address it > nor could it get anything beyond 4GB-1 as a size and then only if using > unsigned as the sizing... It's been far too long since I worked on 32 > bit arches (or less). The 90's are over aren't they ;-) Sure it can. Kernel might use long long internally. It definitely has to use bigger type than size_t otherwise we couldn't have >4GB files. But we can. BTW there are other ways to detect holes in files. For instance 'cp' uses ioctl(fd, FS_IOC_FIEMAP, fiemap) to get map of extents on the disk. The fiemap struct then uses uint64_t type to represent addresses and lengths for extents. Therefore I'm gonna stick with my decision and have this unsigned long long. The other argument for using that is: we want ULL to be used at RPC level, right? However, this might clash when mixing 32 and 64 bit client/server as decoding into size_t might fail - ULL does not fit into size_t on 32 bits. Here's an example of a big sparse file: rpi ~ # truncate -s 5G bla.raw rpi ~ # ls -lhs bla.raw 0 -rw-r--r-- 1 root root 5.0G May 12 14:57 bla.raw rpi ~ # uname -a Linux rpi 4.1.15-v7+ #830 SMP Tue Dec 15 17:02:45 GMT 2015 armv7l ARMv7 Processor rev 5 (v7l) BCM2709 GNU/Linux rpi ~ # /home/zippy/tmp/sparse/sparse_map bla.raw hole: 0 len: 5368709120 end: 5368709120 sparse_map is a program I've written for getting data/hole map of a given file. You can find it here: https://pastebin.com/SSjAUi3q BTW: try changing those (long long) typecasts to (size_t) and you'll immediately see why we need long long instead of size_t. Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list