Kyle J. McKay wrote: > According to POSIX [1] for read: > > If the value of nbyte is greater than {SSIZE_MAX}, the result is > implementation-defined. Sure. [...] > Since OS X still supports running 32-bit executables, and SSIZE_MAX is 2GB - > 1 when running 32-bit it would seem the same limit has been imposed on > 64-bit executables. In any case, we should avoid "implementation-defined" > behavior Wait --- that's a big leap. In a 64-bit executable, SSIZE_MAX is 2^63 - 1, so the behavior is not implementation-defined. I'm not sure if Steffen's copy of git is 32-bit or 64-bit --- my guess would be 64-bit. So at first glance this does look like an XNU-specific bug, not a standards thing. What about the related case where someone does try to "git add" a file with a clean filter producing more than SSIZE_MAX and less than SIZE_MAX bytes? strbuf_grow() does not directly protect against a strbuf growing to > SSIZE_MAX bytes, but in practice on most machines realloc() does. So in practice we could never read more than SSIZE_MAX characters in the strbuf_read() codepath, but it might be worth a check for paranoia anyway. While we're here, it's easy to wonder: why is git reading into such a large buffer anyway? Normally git uses the streaming codepath for files larger than big_file_threshold (typically 512 MiB). Unfortunately there are cases where it doesn't. For example: - convert_to_git() has not been taught to stream, so paths with a clean filter or requiring crlf conversion are read or mapped into memory. - deflate_to_pack() relies on seeking backward to retry when a pack would grow too large, so "git hash-object --stdin" cannot use that codepath. - a "clean" filter can make a file bigger. Perhaps git needs to learn to write to a temporary file when asked to keep track of a blob that is larger than fits reasonably in memory. Or maybe not. So there is room for related work but the codepaths that read() indefinitely large files do seem to be needed, at least in the short term. Working around this Mac OS X-specific limitation at the read() level like you've done still sounds like the right thing to do. Thanks, both, for your work tracking this down. Hopefully the next version of the patch will be in good shape and then it can be applied quickly. Thanks and hope that helps, Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html