Re: git-cvsimport doesn't quite work, wrt branches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/14/06, Keith Packard <keithp@xxxxxxxxxx> wrote:
On Wed, 2006-06-14 at 10:55 +1200, Martin Langhoff wrote:

> In terms of history parsing, parsecvs and cvs2svn are similar. I like
> cvs2svn "many passes" approach better, though the Python source is
> really messy. A good thing about cvs2svn is that it is a lot more
> conservative WRT memory use.

I will try to fix parsecvs so it doesn't take so much memory. Of course,
my goal was to import various X.org repositories which have horrible
issues, but aren't all that huge. And, for them, it works just fine.

Would it be possible to have it parse the RCS histories from a remote repo?

I had forgotten, but that's something else that the cvsps +
git-cvsimport combo can do. In short, to replace cvsps+git-cvsimport
...

+ not memory bound -- or at least must be able to import large
(mozilla, gentoo) with a decent amount of memory

+ must work local and remote (of course local can be faster)

+ must do incrementals reasonably well

I'd like some help figuring out how to do incremental imports with
parsecvs. As parsecvs already constructs the project history from the
present into the past, it should be possible to "notice" when it hits
existing bits in the repository and stop automatically. I think this
will just take saving a bit of state in the git repository to mark where
in CVS the tips of each branch come from.

Ok. Before starting to read the RCS files, I would look at all the
branch tips in the git repo, and read some metadata of the last commit
of each head into memory (author, commitmsg, timestamp, diffstat).

When parsing RCS files and building changesets to import, compare them
with the 'head' data. The timestamp granularity is seconds which is
pretty coarse -- you can ask for history post those timestamps, but
there's the risk of missing commits (this affects git-cvsimport today,
and I'm thinking how to fix it there). So borderline changesets should
be compared against the metadata you have.

There is the chance that your earlier import caught a commit partway
through, so you may end up putting in the 'rest' of the commit. That's
why diffstat can be useful.

Is that useful?


cheers,



martin
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]