Repacking many disconnected blobs

Keith Packard <keithp@xxxxxxxxxx> · Wed, 14 Jun 2006 00:17:58 -0700

parsecvs scans every ,v file and creates a blob for every revision of
every file right up front. Once these are created, it discards the
actual file contents and deals solely with the hash values.

The problem is that while this is going on, the repository consists
solely of disconnected objects, and I can't make git-repack put those
into pack objects. This leaves the directories bloated, and operations
within the tree quite sluggish. I'm importing a project with 30000 files
and 30000 revisions (the CVS repository is about 700MB), and after
scanning the files, and constructing (in memory) a complete revision
history, the actual construction of the commits is happening at about 2
per second, and about 70% of that time is in the kernel, presumably
playing around in the repository.

I'm assuming that if I could get these disconnected blobs all neatly
tucked into a pack object, things might go a bit faster.
-- 
keith.packard@xxxxxxxxx
Attachment:
signature.asc

Description: This is a digitally signed message part