Re: Diff format in packs

"Martin Langhoff" <martin.langhoff@xxxxxxxxx> · Tue, 1 Aug 2006 14:16:46 +1200

On 8/1/06, Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
On 7/31/06, Martin Langhoff <martin.langhoff@xxxxxxxxx> wrote:
> Jon,
>
> just get all the file versions out of the ,v file and into the GIT
> repo, then do find .git/objects/ -type f | git-pack-objects. You don't
> have to even think of generating the packfile yourself.

Moz CVS expands into over 1M files and 12GB in size. I keep getting
concerned about algorithms that take days to complete and need 4GB to
run.

If you run that every 1000 rcs files converted, it will be really
cheap in processing and memory footprint. That's not a concern.

> On 8/1/06, Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> > I am working on combining cvs2svn, parsecvs and cvsps into something
> > that can handle Mozilla CVS.
>
> If you publish your WIP somewhere, I might be able to jump in and help
> a bit. I've seen your "challenge" email earlier, but haven't been able
> to get started yet -- lots of work on other foss fronts.

I haven't got anything useful yet, I keep switching in and out of
working on this. I am still trying to work out a viable transition
strategy that I can attempt to sell the Mozilla developers on. So far
I don't have one.

I understand that, and it's a shame.

The requirements I have so far:

Yep to 1..4. I suspect that you can get "there" with a converted
cvs2svn transformed to deal with git as your are pursuing, and in
dealing with the follow-on imports using git-cvsimport. The only real
limitation there is that new branches opened in that transition period
may be imported with the root in the wrong place.

But for "ongoing" branches, the setup works great. I've done in many
times with parsecvs for the initial import and git-cvsimport for the
subsequent incrementals.

5) a bonus feature would be a partial repository to avoid the initial
700MB git download.

Agreed. However, I thought I had gotten it to be much slimmer than
that, but I may be wrong. Also, a current Moz checkout via cvs is
massively chatty, so between bandwidth and latency, I think the git
protocol beats cvs for the initial checkout even for Moz.

I've spent more time looking at parsecvs than cvsps, is it reasonable
to convert cvsps to the algorithm described above? Another strategy

I don't think cvsps is easily fixable.

would be to use cvs2svn to build the changeset database and then use
cvsps to simply read the changesets out of it and build the git
repository.

Once cvs2svn has the db built, it should be easy to write a
perl/python script that mimics the output of cvsps.

Parsecvs never finishes the conversion it always hits an error or GPF
after 4-5 hours, probably a wild pointer somewhere.

Hmmmm. Nag Keith?

martin
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html