Re: Creating objects manually and repack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, 4 Aug 2006, Jon Smirl wrote:
>
> I am converting all of the revisions from each CVS file into git
> objects the first time the file is parsed. The plan was to run repack
> after each file is finished. That way it should be easy to figure out
> the deltas since everything will be a variation on the same file.

Sure. In that case, just list the object ID's in the exact same order you 
created them.

Basically,as you create them, just keep a list of all ID's you've created, 
and every (say) 50,000 objects, just do a

	echo all objects you've created | git-pack-objects new-pack

and then move the new pack into place, and remove all the loose objects 
(don't even bother using "git prune" - just basically do something like
"rm -rf .git/objects/??" to get rid of them).

> So what's the best way to pack these objects, append them to the
> existing pack and then clean everything up for the next file? I am
> parsing 120K CVS files containing over 1M revs.

You'll want to repack every once in a while just to not ever have _tons_ 
of those loose objects around, but if you do it every 50,000 objects, 
you'll have just twenty nice pack-files once you're done, containing all 
one million objects, and you'll never have had more than ~200 files in any 
of the loose object subdirectories.

Of course, you might want to make that "every 50,000 object" thing 
tunable, so that if you don't have a lot of memory for caching, you might 
want to do it a bit more often just to make each repack go faster and not 
have tons of IO. 

You can then do a _full_ repack to get one big object, by just listing 
every object you ever created (in creation order) to git-pack-objects, and 
then you can replace all the twenty (smaller) pack-files with the 
resulting single bigger one.

In fact, at that point you no longer even need to worry about "creation 
order", since you've basically created all the deltas in the first phase, 
and regardless of ordering, when you then repack everything at the end, it 
will re-use all earlier delta information.

		Linus
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]