Junio C Hamano <gitster@xxxxxxxxx> writes: > One interesting question is which of these two types we should use > for the size of objects Git uses. > > Most of the "interesting" operations done by Git require that the > thing is in core as a whole before we can do anything (e.g. compare > two such things to produce delta, have one in core and apply patch), > so it is tempting that we deal with size_t, but at the lowest level > to serve as a SCM, i.e. recording the state of a file at each > version, we actually should be able to exceed the in-core > limit---both "git add" of a huge file whose contents would not fit > in-core and "git checkout" of a huge blob whose inflated contents > would not fit in-core should (in theory, modulo bugs) be able to > exercise the streaming interface to handle such case without holding > everything in-core at once. So from that point of view, even size_t > may not be the "correct" type to use. A few additions to the above observations. - We have varint that encodes how far the location from a delta representation of an object to its base object in the packfile. Both encoding and decoding sides in the current code use off_t to represent this offset, so we can already reference an object that is far in the same packfile as a base. - I think it is OK in practice to limit the size of individual objects to size_t (i.e. on 32-bit arch, you cannot interact with a repository with an object whose size exceeds 4GB). Using off_t would allow occasional ultra-huge objects that can only be added and checked in via the streaming API on such a platform, but I suspect that it may become too much of a hassle to maintain. It may help reducing the maintenance if we introduced obj_size_t that is defined to be size_t for now, so that we can later swap it to ofs_t or some larger type when we know we do need to support objects whose size cannot be expressed in size_t, but I do not offhand know what the pros-and-cons with such an approach would look like. Thanks.