Re: [PATCH] Support 64-bit indexes for pack files.

Nicolas Pitre <nico@xxxxxxx> · Tue, 27 Feb 2007 00:11:20 -0500 (EST)

On Mon, 26 Feb 2007, Geert Bosch wrote:

> Why can't we do it with the current 1<<8 entry fan-out?
> This would allow increases of pack file size up to 1 TB.

I had the exact same thought while I was writing the previous mail.
It is indeed perfectly fine and would require less than 10 lines of code 
to implement.

> BTW, here are a few issues with the current pack file format:
>  - The final SHA1 consists of the count of objects in the file
>    and all compressed data. Why? This is horrible for streaming
>    applications where you only know the count of objects at the
>    end, then you need to access *all* data to compute the SHA-1.
>    Much better to just use compute a SHA1 over the SHA1's of each
>    object. That way at least the data streamed can be streamed to
>    disk. Buffering one SHA1 per object is probably going to be OK.

We always know the number of objects before actually constructing or 
streaming a pack.  Finding best delta matches require that we sort the 
object list by type, but for good locality we need to re-sort that list 
by recency.  So we always know the number of objects before starting to 
write since we need to have the list of objects in memory anyway.

Also the receiving end of a streamed pack wants to know the number of 
objects first if only to provide the user with some progress report.

>  - The object count is implicit in the SHA1 of all objects and the
>    objects we find in the file. Why do we need it in the first place?
>    Better to recompute it when necessary. This makes true streaming
>    possible.

Sorry, I don't follow you here.

Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html