Re: Incremental CVS update

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/22/06, Martin Langhoff <martin.langhoff@xxxxxxxxx> wrote:
On 6/23/06, Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> cvsps keeps it's incremental status in ~/.cvps/*. parsecvs might want
> to keep it's status in the .git repository and use tags to locate it.
> You could even have a utility to show when and what was imported. By
> keeping everything in git it doesn't matter who runs the incremental
> update commands.

Jon,

what cvsps keeps is a cache of what it knows about the repo history,
to ask only for new commits. Now, cvsps will always write to STDOUT
the full history, and git-cvsimport discards the commits it has
already seen, based on reading the state of each git head.

The cache is 723MB for the Mozilla repo. Since the info gets cached in
my home directory anyone else who needs to sync the repo doesn't get
to use the cache.

[jonsmirl@jonsmirl .cvsps]$ pwd
/home/jonsmirl/.cvsps
[jonsmirl@jonsmirl .cvsps]$ ls -l
total 707492
-rw-rw-r-- 1 jonsmirl jonsmirl 723758657 Jun 15 16:10 #home#mozcvs##mozilla
[jonsmirl@jonsmirl .cvsps]$


Keith is rewriting parsecvs. If you analyze all of the data
structures, the info needed for the conversion should be able to fit
into well under 100MB instead of the ~2GB the current programs are
using.

There are lots of ways to reduce memory consumption. You can turm CVS
revisions into git IDs as soon as the revision is seen. That lets you
get away from tracking file names and long CVS revision numbers. It
also works to turn the author/log fields immediately into a hash. When
possible switching to arrays instead of linked list is smaller too.

Some stats:
1M revisions
200K unique changesets (author/log combos)
200KB symbols
1,800 branches

cvsps has the lowest memory consumption, it uses 1200 bytes per
revision. It looks like it is possible to lower this to less than 100
bytes per rev.


So cvsps + git-cvsimport don't keep any extra data around, and I am
100% certain that parsecvs don't need that either.

cheers,


martin



--
Jon Smirl
jonsmirl@xxxxxxxxx
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]