On Wed, 14 Jun 2006 00:17:58 -0700 Keith Packard wrote: > parsecvs scans every ,v file and creates a blob for every revision of > every file right up front. Once these are created, it discards the > actual file contents and deals solely with the hash values. > > The problem is that while this is going on, the repository consists > solely of disconnected objects, and I can't make git-repack put those > into pack objects. This leaves the directories bloated, and operations > within the tree quite sluggish. I'm importing a project with 30000 files > and 30000 revisions (the CVS repository is about 700MB), and after > scanning the files, and constructing (in memory) a complete revision > history, the actual construction of the commits is happening at about 2 > per second, and about 70% of that time is in the kernel, presumably > playing around in the repository. > > I'm assuming that if I could get these disconnected blobs all neatly > tucked into a pack object, things might go a bit faster. git-repack.sh basically does: git-rev-list --objects --all | git-pack-objects .tmp-pack When you have only disconnected blobs, obviously the first part does not work - git-rev-list cannot find these blobs. However, you can do that part manually - e.g., when you add a blob, do: fprintf(list_file, "%s %s\n", sha1, path); (path should be a relative path in the repo without ",v" or "Attic" - it is used for delta packing optimization, so getting it wrong will not cause any corruption, but the pack may become significantly larger). You may output some duplicate sha1 values, but git-pack-objects should handle duplicates correctly. Then just invoke "git-pack-objects --non-empty .tmp_pack <list_file"; it will output the resulting pack sha1 to stdout. Then you need to move the pack into place and call git-prune-packed (which does not use object lists, so it should work even with unreachable objects). You may even want to repack more than once during the import; probably the simplest way to do it is to truncate list_file after each repack and use "git-pack-objects --incremental".
Attachment:
pgpTpKloiCwcN.pgp
Description: PGP signature