Erik Faye-Lund <kusmabite@xxxxxxxxx> writes: > On Sun, Jun 12, 2011 at 11:33 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >> Erik Faye-Lund <kusmabite@xxxxxxxxx> writes: >> >>> On Fri, Jun 10, 2011 at 10:15 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >>>> The size of objects we read from the repository and data we try to put >>>> into the repository are represented in "unsigned long", so that on larger >>>> architectures we can handle objects that weigh more than 4GB. >>> >>> shouldn't this be "size_t" instead of "unsigned long"? >> >> No, this must be unsigned long as that is the internal type we use. There are two unrelated issues you have to address if your "unsigned long" is 32-bit and you want to handle more than 4GB data in git. When git holds repository data in core, it always has represented it as a pair of <pointer to the beginning of memory block that holds data, length> where the length is "unsigned long" from day one. See read_sha1_file() in read-cache.c that appears in e83c516 (Initial revision of "git", the information manager from hell, 2005-04-07). This limits you to 4GB if your "unsigned long" is 32-bit. The right type to use in order to enable more platforms to go beyond 4GB might be to use uintmax_t, but the series you are commenting on however is not about changing that. We have another problem stemming from the way in which we incorrectly used zlib API even on a platform where "unsigned long" is capable to express size beyond 4GB. In many places, we set up the state object used by zlib API (i.e. z_stream) to point at the "pointer to the beginning of memory block" with its "next_in" field, and "length" with its "avail_in" field, pass that object around in the callchain, and expect that by making repeated call to zlib, "next_in" would eventually progress to the end of the data we have in core while "avail_in" would fall to zero when all data is processed. The "avail_in" field zlib API gives us however is uInt which is 32-bit, so this expectation is incorrect. If you have 4G+32 bytes of data, for example, we only feed 32 bytes and stop, barfing on "corrupt" data. That is the issue this series is about. The approach of the series takes is to wrap zlib's state object with our own, that has our own "avail_in" field (by the way, the same issue exists in "next_out/avail_out" on the output side) that uses the same type of "length" used in other parts of our system. The type of the "avail_in" and "avail_out" fields in the wrapper needs to be updated to match that type when you address the "other" issue to update all the internal "length" from "unsigned long" to "uintmax_t", but not before. And updating the rest of the system to "uintmax_t" is not part of the scope of this series. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html