Re: Repacking many disconnected blobs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Wed, 14 Jun 2006, Shawn Pearce wrote:

> Keith Packard <keithp@xxxxxxxxxx> wrote:
> > parsecvs scans every ,v file and creates a blob for every revision of
> > every file right up front. Once these are created, it discards the
> > actual file contents and deals solely with the hash values.
> > 
> > The problem is that while this is going on, the repository consists
> > solely of disconnected objects, and I can't make git-repack put those
> > into pack objects. This leaves the directories bloated, and operations
> > within the tree quite sluggish. I'm importing a project with 30000 files
> > and 30000 revisions (the CVS repository is about 700MB), and after
> > scanning the files, and constructing (in memory) a complete revision
> > history, the actual construction of the commits is happening at about 2
> > per second, and about 70% of that time is in the kernel, presumably
> > playing around in the repository.
> > 
> > I'm assuming that if I could get these disconnected blobs all neatly
> > tucked into a pack object, things might go a bit faster.
> 
> What about running git-update-index using .git/objects as the
> current working directory and adding all files in ??/* into the
> index, then git-write-tree that index and git-commit-tree the tree.
> 
> When you are done you have a bunch of orphan trees and a commit
> but these shouldn't be very big and I'd guess would prune out with
> a repack if you don't hold a ref to the orphan commit.

Alternatively, you could construct fake trees like this:

README/1.1.1.1
README/1.2
README/1.3
...

i.e. every file becomes a directory -- containing all the versions of that 
file -- in the (virtual) tree, which you can point to by a temporary ref.

Ciao,
Dscho

-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]