Re: Sparse image volDownload

Michal Privoznik <mprivozn@xxxxxxxxxx> · Tue, 8 Dec 2015 10:13:31 +0100

On 07.12.2015 20:25, Vasiliy Tolstov wrote:
> 07 дек. 2015 г. 18:13 пользователь "Daniel P. Berrange" <berrange@xxxxxxxxxx>
> написал:
>>
>> On Mon, Dec 07, 2015 at 04:04:40PM +0100, Michal Privoznik wrote:
>>> On 07.12.2015 14:51, Daniel P. Berrange wrote:
>>>> On Mon, Dec 07, 2015 at 02:46:59PM +0100, Michal Privoznik wrote:
>>>>> Dear list,
>>>>>
>>>>> I'd like to hear your opinion on the following bug:
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1282859
>>>>>
>>>>> Long story short, Imagine the following scenario:
>>>>>
>>>>> 1. Create 4GB file full of zeroes
>>>>> 2. virsh vol-download it
>>>>>
>>>>> What happens is that all those 4GB are transferred byte after byte
>>>>> through our RPC system. Not only this puts needles pressure on our
> event
>>>>> loop, it's suboptimal for network and other resources too.
>>>>>
>>>>> I'd like to explore our options here keeping in mind that the
> original
>>>>> volume might have been sparse and we ought to keep it sparse on the
>>>>> destination too.
>>>>>
>>>>> In the bug the reporter (Matthew Booth) suggests introducing new
> type of
>>>>> RPC message that will let us keep our APIs unchanged. The source will
>>>>> scan the file for windows of zeroes bigger than some value. When
> found
>>>>> the new type of message is passed to client without need to copy
> those
>>>>> zeroes. Yes, this is very similar to RLE.
>>>>>
>>>>> If we are going that way, should we enable users to put a compression
>>>>> program in between read()/write() and our RPC? Well, should we let
> users
>>>>> to choose what compression program we will put there? Because there
> are
>>>>> better compression algorithms than RLE.
>>>>
>>>> It only looks like compression if you're solely looking at the network
>>>> data transfer. A keep feature of sparse support is that we preserve
>>>> the sparseness on both sides.
>>>>
>>>> ie, if I have a sparse raw file locally, and vol-upload it, it should
>>>> remain a sparse file on the server. Likewise vol-downloading a sparse
>>>> file should let me create a sparse file locally.  For this reason the
>>>> RPC program must explicitly represent data holes, and not merely
>>>> consider them a type of compression algorithm, as that would not let
>>>> us preserve the holes on both ends of the stream.
>>>
>>> Right. But how could we apply both our RLE algorithm and an external
>>> program on the same stream? Should we multiplex and send holes to the
>>> other side as they are and run the rest through the external compression
>>> program? Otherwise I don't see how we could preserve sparseness.
>>
>> I think we should just focus on sending holes in the RPC protocol
>> right now, and not try todo compression at the same time, as we need
>> to be able to represent holes in the protocol regardless of whether
>> compression is present.
>>
> 
> Sometimes ago I'm already ask about this and to add compress flag to vol
> upload and download (don't have time to complete).
> For my use case best way is to able to create compressed stream that goes
> to libvirt. So in this case we effectively solve sparse file problem and
> also can transfer less data, all my tests with lz4 compression says that I
> get is about 20% minimum benefit compared to original volume size.

Right. And as Dan pointed out, these two approaches are orthogonal to
each other. Compressing a stream of data to reduce size is a nice
feature to have, preserving sparseness of a file is something different
though (although the way I'm intending to implement it will reduce data
sent through virStream too).

One thing that I am still wondering about is sparseness detection.
Finding a window full of zeroes in a file does not necessarily mean that
those come from read() over segment that's not on disk. We surely can
have a raw file that is sparse and also contains a window full of
zeroes. But I guess it's okay if we sparsify (if that's even a verb)
file even more on volDownload or volUpload.

Michal

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list