Geert Bosch <bosch@xxxxxxxxxxx> wrote: > When I import a large code-base (such as a *.tar.gz), I don't know > beforehand how many objects I'm going to create. Ideally, I'd like > to stream them directly into a new pack without ever having to write > the expanded source to the filesystem. See git-fast-import. If you are coming from a tar, also see contrib/fast-import/import-tars.perl. :-) > So for creating a large pack from a stream of data, you have to do > the following: > 1. write out a temporary pack file to disk without correct count > 2. fix-up the count > 3. read the entire temporary pack file to compute the final SHA-1 > 4. fix-up the SHA1 at the end of the file > 5. construct and write out the index Yes, this is exactly what git-fast-import does. Yes, it sort of sucks. But its not as bad as you think. > There are a few ways to fixing this: > - Have a count of 0xffffffff mean: look in the index for the count. > Pulling/pushing would still use regular counted pack files. > - Have the pack file checksum be the SHA1 of (the count followed > by the SHA1 of the compressed data of each object). This would > allow 3. > to be done without reading back all data. I don't think it is worth it. Aside from git-fast-import we always know the object count before we start writing any data. But despite that, fast-import runs quite well. -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html