Re: [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X

Jonathan Nieder <jrnieder@xxxxxxxxx> · Sat, 17 Aug 2013 14:23:14 -0700

Kyle J. McKay wrote:

> According to POSIX [1] for read:
>
> If the value of nbyte is greater than {SSIZE_MAX}, the result is
> implementation-defined.

Sure.

[...]
> Since OS X still supports running 32-bit executables, and SSIZE_MAX is 2GB -
> 1 when running 32-bit it would seem the same limit has been imposed on
> 64-bit executables.  In any case, we should avoid "implementation-defined"
> behavior

Wait --- that's a big leap.

In a 64-bit executable, SSIZE_MAX is 2^63 - 1, so the behavior is not
implementation-defined.  I'm not sure if Steffen's copy of git is
32-bit or 64-bit --- my guess would be 64-bit.  So at first glance
this does look like an XNU-specific bug, not a standards thing.

What about the related case where someone does try to "git add"
a file with a clean filter producing more than SSIZE_MAX and less
than SIZE_MAX bytes?

strbuf_grow() does not directly protect against a strbuf growing to >
SSIZE_MAX bytes, but in practice on most machines realloc() does.  So
in practice we could never read more than SSIZE_MAX characters in the
strbuf_read() codepath, but it might be worth a check for paranoia
anyway.

While we're here, it's easy to wonder: why is git reading into such a
large buffer anyway?  Normally git uses the streaming codepath for
files larger than big_file_threshold (typically 512 MiB).
Unfortunately there are cases where it doesn't.  For example:

  - convert_to_git() has not been taught to stream, so paths
    with a clean filter or requiring crlf conversion are read or
    mapped into memory.

  - deflate_to_pack() relies on seeking backward to retry when
    a pack would grow too large, so "git hash-object --stdin"
    cannot use that codepath.

  - a "clean" filter can make a file bigger.

  Perhaps git needs to learn to write to a temporary file
  when asked to keep track of a blob that is larger than fits
  reasonably in memory.  Or maybe not.

So there is room for related work but the codepaths that read()
indefinitely large files do seem to be needed, at least in the short
term.  Working around this Mac OS X-specific limitation at the read()
level like you've done still sounds like the right thing to do.

Thanks, both, for your work tracking this down.  Hopefully the next
version of the patch will be in good shape and then it can be applied
quickly.

Thanks and hope that helps,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html