Martin Koegler <martin.koegler@xxxxxxxxx> writes: > On Thu, Aug 10, 2017 at 01:07:07PM -0700, Junio C Hamano wrote: >> > The current delta code produces incorrect pack objects for files > 4GB. >> > >> > Signed-off-by: Martin Koegler <martin.koegler@xxxxxxxxx> >> >> I am a bit torn on this change. >> >> Given that this is not merely a local storage format but it also is >> an interchange format, we would probably want to make sure that the >> receiving end (e.g. get_delta_hdr_size() that is used at the >> beginning of patch_delta()) on a platform whose size_t is smaller >> than that of a platform that produced the delta stream with this >> code behaves "sensibly". > > Overflows would already be detected during unpack: > * Assuming size_t = uint32, the system should just be able to handle up to 4GB of process memory. > So loading any source blob larger than 4GB should already fail. After re-reading patch_delta(), I agree. The loading of the base object (i.e. the procedure to prepare src_buf & src_size fed to the function) would have failed already. > * Assuming size_t = uint32 and a source blob size < 4 GB, the > target blob size would be readed truncated. apply_delta checks, > that the generated result matches the encoded size - this check > would fail. When target size is truncated, we allocate much less than the real target size, and start reading the input. But patch_delta() makes sure that what is copied into the buffer, either by copying bytes literally from the pack data or by copying from src_buf, will never cause the resulting dst_buf overflow (especially after your change updates "size" to size_t), so we should be safe on this side, too. > >> If we replaced ulong we use in create/patch delta codepaths with >> uint32_t, that would be safer, just because the encoder would not be >> able to emit varint that is larger than the receivers to handle. >> But that defeats the whole point of using varint() to encode the >> sizes in the first place. It was partly done for space saving, but >> more for allowing larger sizes and larger ulong in the future >> without having to change the file format. > > The ondisk-format is able to handle larger sizes [using a slightly worse compression]. > The current implementation is just buggy. > > I would not move to uint32_t. The remaing part of git uses "unsigned long", so the > delta code could still be called with larger files. Oh, absolutely. I was merely commenting on the lack of any error checking in get_delta_hdr_size() helper function, and dismissing a naive move to limiting the file format by insisting on uint32_t as an unworkable workaround.