Hi, On Wed, 14 Jun 2006, Shawn Pearce wrote: > Keith Packard <keithp@xxxxxxxxxx> wrote: > > parsecvs scans every ,v file and creates a blob for every revision of > > every file right up front. Once these are created, it discards the > > actual file contents and deals solely with the hash values. > > > > The problem is that while this is going on, the repository consists > > solely of disconnected objects, and I can't make git-repack put those > > into pack objects. This leaves the directories bloated, and operations > > within the tree quite sluggish. I'm importing a project with 30000 files > > and 30000 revisions (the CVS repository is about 700MB), and after > > scanning the files, and constructing (in memory) a complete revision > > history, the actual construction of the commits is happening at about 2 > > per second, and about 70% of that time is in the kernel, presumably > > playing around in the repository. > > > > I'm assuming that if I could get these disconnected blobs all neatly > > tucked into a pack object, things might go a bit faster. > > What about running git-update-index using .git/objects as the > current working directory and adding all files in ??/* into the > index, then git-write-tree that index and git-commit-tree the tree. > > When you are done you have a bunch of orphan trees and a commit > but these shouldn't be very big and I'd guess would prune out with > a repack if you don't hold a ref to the orphan commit. Alternatively, you could construct fake trees like this: README/1.1.1.1 README/1.2 README/1.3 ... i.e. every file becomes a directory -- containing all the versions of that file -- in the (virtual) tree, which you can point to by a temporary ref. Ciao, Dscho - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html