Re: [PATCH V2 1/2] Fix delta integer overflows

Martin Koegler <martin.koegler@xxxxxxxxx> · Fri, 11 Aug 2017 09:43:41 +0200

On Thu, Aug 10, 2017 at 01:07:07PM -0700, Junio C Hamano wrote:
> > The current delta code produces incorrect pack objects for files > 4GB.
> >
> > Signed-off-by: Martin Koegler <martin.koegler@xxxxxxxxx>
> 
> I am a bit torn on this change.
> 
> Given that this is not merely a local storage format but it also is
> an interchange format, we would probably want to make sure that the
> receiving end (e.g. get_delta_hdr_size() that is used at the
> beginning of patch_delta()) on a platform whose size_t is smaller
> than that of a platform that produced the delta stream with this
> code behaves "sensibly".

Overflows would already be detected during unpack:
* Assuming size_t = uint32, the system should just be able to handle up to 4GB of process memory.
So loading any source blob larger than 4GB should already fail.
* Assuming size_t = uint32 and a source blob size < 4 GB, the target blob size would be readed
truncated. apply_delta checks, that the generated result matches the encoded size - this check would
fail.

> If we replaced ulong we use in create/patch delta codepaths with
> uint32_t, that would be safer, just because the encoder would not be
> able to emit varint that is larger than the receivers to handle.
> But that defeats the whole point of using varint() to encode the
> sizes in the first place.  It was partly done for space saving, but
> more for allowing larger sizes and larger ulong in the future
> without having to change the file format.

The ondisk-format is able to handle larger sizes [using a slightly worse compression].
The current implementation is just buggy.

I would not move to uint32_t. The remaing part of git uses "unsigned long", so the 
delta code could still be called with larger files.

We will also see more RAM as well as CPU power - reducing the limits just because of older plattforms,
which can't even handle such large blobs, is the wrong way.

Regards,
Martin