On 14/08/17 18:08, Junio C Hamano wrote: > Junio C Hamano <gitster@xxxxxxxxx> writes: > >> One interesting question is which of these two types we should use >> for the size of objects Git uses. >> >> Most of the "interesting" operations done by Git require that the >> thing is in core as a whole before we can do anything (e.g. compare >> two such things to produce delta, have one in core and apply patch), >> so it is tempting that we deal with size_t, but at the lowest level >> to serve as a SCM, i.e. recording the state of a file at each >> version, we actually should be able to exceed the in-core >> limit---both "git add" of a huge file whose contents would not fit >> in-core and "git checkout" of a huge blob whose inflated contents >> would not fit in-core should (in theory, modulo bugs) be able to >> exercise the streaming interface to handle such case without holding >> everything in-core at once. So from that point of view, even size_t >> may not be the "correct" type to use. > > A few additions to the above observations. > > - We have varint that encodes how far the location from a delta > representation of an object to its base object in the packfile. > Both encoding and decoding sides in the current code use off_t to > represent this offset, so we can already reference an object that > is far in the same packfile as a base. > > - I think it is OK in practice to limit the size of individual > objects to size_t (i.e. on 32-bit arch, you cannot interact with > a repository with an object whose size exceeds 4GB). Using off_t > would allow occasional ultra-huge objects that can only be added > and checked in via the streaming API on such a platform, but I > suspect that it may become too much of a hassle to maintain. In a previous comment, I said that (on 32-bit Linux) it was likely that an object of > 4GB could not be handled correctly anyway. (more likely > 2GB). This was based on the code from (quite some) years ago. In particular, before you added the "streaming API". So, maybe a 32-bit arch _should_ be able to handle objects as large as the LFS API allows. (Ignoring, for the moment, that I think anybody who puts files of that size into an SCM probably gets what they deserve. :-P ). The two patches I commented on, however, changed the type of some variables from off_t to size_t. In general, the patches did not seem to make anything worse, but these type changes could potentially do harm. Hence my comment. (I still haven't tried the patches on my 32-bit Linux system. I only boot it up about once a week, and I would rather wait until the patches are in the 'pu' branch before testing). ATB, Ramsay Jones