Re: [PATCH 1/9] Convert pack-objects to size_t

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 13 Aug 2017 15:15:41 -0700

Ramsay Jones <ramsay@xxxxxxxxxxxxxxxxxxxx> writes:

> It should be, see commit b97e911643 ("Support for large files
> on 32bit systems.", 17-02-2007), where you can see that the
> _FILE_OFFSET_BITS macro is set to 64. This asks <stdio.h> et.al.,
> to use the "Large File System" API and a 64-bit off_t.

Correct.  Roughly speaking, we should view off_t as size of things
you can store in a file, and size_t as size of things you can have
in core.  When we allocate a region of memory, and then repeatedly
fill it with some data and hand that memory to a helper function
e.g. git_deflate(), each of these calls should expect to get data
whose size can be representable by size_t, and if that is shorter
than ulong which we currently use, we are artificially limiting our
potential by using a type that is narrower than necessary.  

The result from these helper functions that are repeatedly called
may be sent to a file as the output from the loop.  If that logic in
the outer loop wants to keep a tally of the total size of data they
processed, that number may not be fit in size_t and instead may
require off_t.

One interesting question is which of these two types we should use
for the size of objects Git uses.  

Most of the "interesting" operations done by Git require that the
thing is in core as a whole before we can do anything (e.g. compare
two such things to produce delta, have one in core and apply patch),
so it is tempting that we deal with size_t, but at the lowest level
to serve as a SCM, i.e. recording the state of a file at each
version, we actually should be able to exceed the in-core
limit---both "git add" of a huge file whose contents would not fit
in-core and "git checkout" of a huge blob whose inflated contents
would not fit in-core should (in theory, modulo bugs) be able to
exercise the streaming interface to handle such case without holding
everything in-core at once.  So from that point of view, even size_t
may not be the "correct" type to use.