Re: [PATCH] Support 64-bit indexes for pack files.

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Tue, 27 Feb 2007 11:11:22 -0500

Geert Bosch <bosch@xxxxxxxxxxx> wrote:
> When I import a large code-base (such as a *.tar.gz), I don't know
> beforehand how many objects I'm going to create. Ideally, I'd like
> to stream them directly into a new pack without ever having to write
> the expanded source to the filesystem.

See git-fast-import.  If you are coming from a tar, also see
contrib/fast-import/import-tars.perl.  :-)

> So for creating a large pack from a stream of data, you have to do  
> the following:
>   1. write out a temporary pack file to disk without correct count
>   2. fix-up the count
>   3. read the entire temporary pack file to compute the final SHA-1
>   4. fix-up the SHA1 at the end of the file
>   5. construct and write out the index

Yes, this is exactly what git-fast-import does.  Yes, it sort of
sucks.  But its not as bad as you think.

> There are a few ways to fixing this:
>   - Have a count of 0xffffffff mean: look in the index for the count.
>     Pulling/pushing would still use regular counted pack files.
>   - Have the pack file checksum be the SHA1 of (the count followed
>     by the SHA1 of the compressed data of each object). This would  
> allow 3.
>     to be done without reading back all data.

I don't think it is worth it.  Aside from git-fast-import we
always know the object count before we start writing any data.
But despite that, fast-import runs quite well.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html