On 6/20/06, Keith Packard <keithp@xxxxxxxxxx> wrote:
> Even after spending eight hours building the changeset info iit is > still going to take it a couple of days to retrieve the versions one > at a time and write them to git. Reparsing 50MB delta files n^2/2 > times is a major bottleneck for all three programs. The eight hours in question *were* writing out the deltas and packing the resulting trees. All that remained was to construct actual commit objects and write them out. The problem was that parsecvs's internals are structured so that this processes would take a large amount of memory, so I'm reworking the code to free stuff as it goes along.
How about writing out all of the revisions from the cvs file using the yacc code the first time the file is encountered and parsed. Then you only have to track git IDs and not all of those cumbersome CVS rev numbers. When I was profiling parsecvs the hottest parts of the code were extracting the revisions and comparing cvs rev numbers. Since the git IDs are fixed size they work well in arrays and with pointer compares for sorting. With the right data structure you should be able to eliminate the CVS rev numbers that are so slow to deal with. There are about 1M revisions in moz cvs. At eight byes for an ID and eight bytes for a timestamp that is 16MB if ordering is achieved via arrays. All of the symbols fit into 400K including pointers to their revision. If the revs are written out as they are encountered there is no need to save file names, but you do need one rev structure per file. Throw in some more memory for relationship pointers. All of this should fit into less than 100MB RAM.
With a rewritten parsecvs, I'm hoping to be able to steal the algorithms from cvs2svn and stick those in place. Then work on truncating the history so it can deal with incremental updates to the repository, which I think will be straightforward if we stick a few breadcrumbs in the git repository to recover state from. -- keith.packard@xxxxxxxxx -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iD8DBQBEmBHYQp8BWwlsTdMRAvKAAJ9im3xBdUowt9af+/MtoYDXsCHGtACaAtG4 GygX7WgiFOamLrnTMzWkIPE= =28dp -----END PGP SIGNATURE-----
-- Jon Smirl jonsmirl@xxxxxxxxx - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html