Nicolas Pitre <nico@xxxxxxx> writes: > And the zlib header contains a CRC which we're about to use for > validating the data when doing delta data reuse in order to prevent pack > corruption propagation like the one recently posted on the list. Ah, never thought of using the CRC directly. I was thinking about inflating into void and see if it succeeds, which as you say is perhaps quite expensive. This brings me back to my pet-peeve, though. I do not think zlib stream seeks back and leaves some clue at the beginning to tell me the deflated length, so it is quite hard to find where each deflated stream ends in a packfile cheaply. Loose objects (with new or legacy style header) are easy (st.st_size is available), but I do not think of a way short of building a reverse index of pack .idx file, which means I am already talking about not so cheap way X-<. It might be a reason to define a new .idx format. We could lift 32-bit offset limitation while we are at it. Each entry could have 20-byte hash, 64-bit offset into the corresponding .pack, and 32-bit deflated length (heh, why not make it 64-bit while we are at it). Luckily, .idx _is_ a local matter so we can even have a flag day and tell people to run the updated index-pack on existing packfiles to regenerate .idx. > Using an offset instead of a sha1 to reference a delta base object is > certainly a good idea though. But I'd use the same variable encoding as > the object size to avoid the 32-bit limit issue. When generating a thin > pack the real sha1 of the delta object could be substituted for the > offset quite easily if the base object is not sent a part of the same > pack. That sounds quite a reasonable suggestion. I love this kind of moment when I find us very fortunate to have bright people on the list ;-). - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html