On 07/15/2016 12:38 AM, Jeff King wrote:
On Thu, Jul 14, 2016 at 03:30:58PM -0700, Junio C Hamano wrote:
If we move to time_t everywhere, I think we'll need an extra
TIME_T_IS_64BIT, but we can cross that bridge when we come to it.
Likewise I think we'll need SIZE_T_IS_64BIT eventually (for real 32-bit
systems; LLP64 systems like Windows will then be able to run the test).
I guess I wrote essentially the same thing before refreshing my
Inbox.
I am a bit fuzzy between off_t and size_t; the former is for the
size of things you see on the filesystem, while the latter is for
you to give malloc(3). I would have thought that off_t is the type
we would want at the end of the raw object header, denoting the size
of a blob object when deflated, which could be larger than the size
of a region of memory we can get from malloc(3), in which case we
would use the streaming interface.
Yeah, your understanding is right (s/deflated/inflated/). I agree that
off_t is probably a better size for blobs. Traditionally git assumed any
object could fit in memory. The streaming interface helps that somewhat,
but I think there are cases where we still must load a blob (e.g., if it
is stored as a delta). In theory that never happens because of
core.bigfilethreshold, but you may get a packfile from somebody with a
higher threshold than you.
I wouldn't be surprised if there are other cases that aren't smart
enough to use the streaming interface yet, but the solution there is to
make them smarter. :)
So off_t is probably better. We do need to be careful, though, when
allocating objects. E.g., this:
off_t size;
struct git_istream *stream;
void *buf;
stream = open_istream(sha1, &type, &size, NULL);
buf = xmalloc(size);
while (1) {
/* read stream into buf */
}
is a security hole when size_t is less than off_t (it gets truncated in
the call to xmalloc, which allocates too few bytes). This is a toy
example, obviously, but it's something to watch out for.
-Peff
That code is "illegal", it should be
buf = xmalloc(xsize_t(size));
And the transition from off_t into size_t
should always got via xsize_t():
static inline size_t xsize_t(off_t len)
{
if (len > (size_t) len)
die("Cannot handle files this big");
return (size_t)len;
}
There are some more things to be done, on the long run:
- convert "unsigned long" into either off_t of size_t in e.g. convert.c
- Use the streaming interface to analyze if blobs are binary
(That is already on my list, the old "stream and early out"
from the olc 10/10, gmane/$293010 or so can be reused)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html