Re: packs and trees

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/20/06, Keith Packard <keithp@xxxxxxxxxx> wrote:
> Even after spending eight hours building the changeset info iit is
> still going to take it a couple of days to retrieve the versions one
> at a time and write them to git. Reparsing 50MB delta files n^2/2
> times is a major bottleneck for all three programs.

The eight hours in question *were* writing out the deltas and packing
the resulting trees. All that remained was to construct actual commit
objects and write them out.

The problem was that parsecvs's internals are structured so that this
processes would take a large amount of memory, so I'm reworking the code
to free stuff as it goes along.

How about writing out all of the revisions from the cvs file using the
yacc code the first time the file is encountered and parsed. Then you
only have to track git IDs and not all of those cumbersome CVS rev
numbers. When I was profiling parsecvs the hottest parts of the code
were extracting the revisions and comparing cvs rev numbers. Since the
git IDs are fixed size they work well in arrays and with pointer
compares for sorting. With the right data structure you should be able
to eliminate the CVS rev numbers that are so slow to deal with.

There are about 1M revisions in moz cvs. At eight byes for an ID and
eight bytes for a timestamp that is 16MB if ordering is achieved via
arrays. All of the symbols fit into 400K including pointers to their
revision. If the revs are written out as they are encountered there is
no need to save file names, but you do need one rev structure per
file. Throw in some more memory for relationship pointers. All of this
should fit into less than 100MB RAM.


With a rewritten parsecvs, I'm hoping to be able to steal the algorithms
from cvs2svn and stick those in place. Then work on truncating the
history so it can deal with incremental updates to the repository, which
I think will be straightforward if we stick a few breadcrumbs in the git
repository to recover state from.

--
keith.packard@xxxxxxxxx


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)

iD8DBQBEmBHYQp8BWwlsTdMRAvKAAJ9im3xBdUowt9af+/MtoYDXsCHGtACaAtG4
GygX7WgiFOamLrnTMzWkIPE=
=28dp
-----END PGP SIGNATURE-----





--
Jon Smirl
jonsmirl@xxxxxxxxx
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]