Re: git-cvsimport-3 and incremental imports

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 21, 2013 at 06:28:53AM -0500, Eric S. Raymond wrote:
> John Keeping <john@xxxxxxxxxxxxx>:
>> But this is nothing more than a sticking plaster that happens to do
>> enough in this particular case
> 
> I'm beginning to think that's the best outcome we ever get in this
> problem domain...

I don't think we can ever get a perfect outcome, but it should be
possible to do a little bit better without too much effort.

>>                    - if the Git repository happened to be on
>> a different branch, the start date would be wrong and too many or too
>> few commits could be output.  Git doesn't detect that they commits are
>> identical to some that we already have because we're explicitly telling
>> it to make a new commit with the specified parent.
> 
> Then I don't understand the actual failure case.  Either that or you
> don't understand the effect of -i. Have you actually experimented with
> it?  The reason I suspect you don't understand the feature is that it
> shouldn't make any difference to the way -i works which repository branch is
> active at the time of the second import.
> 
> Here is how I model what is going on:
> 
> 1. We make commits to multiple branches of a CVS repo up to some given time T.
> 
> 2. We import it, ending up with a collection of git branches all of which 
>    have tip commits dated T or earlier. And *every* commit dated T or earlier
>    gets copied over.
>
> 3. We make more commits to the same set of branches in CVS.
> 
> 4. We now run cvsps -d T on the repo. This generates an incremental
>    fast-import stream describing all CVS commits *newer* than T (see
>    the cvsps manual page).

This is the problem step.  There are two scenarios that have problems:

1. If I create a new development branch in my Git repository and commit
   something to it then git-cvsimport-3 will pass a time to cvsps that
   is newer than the actual time of the last import, so T is wrong.

   It may be possible to fix this case purely in git-cvsimport-3.

2. If the branch I have checked out is not the newest CVS branch, then
   git-cvsimport-3 will pass a value of T that is before the time of the
   last import.  This case is more subtle but it results in unwanted
   duplicate commits since git-fast-import will just do what it's told
   and create the new commits.

   So if we have the following commits:

     commit1 at time 1
     commit2 at time 2
     commit3 at time 3

   and I call "cvsps -d 2 -i" I end up with the series:

     commit1 at time 1
     commit2 at time 2
     commit3 at time 3
     commit2 at time 2 - effectively reverting the previous commit
     commit3 at time 3 - a duplicate
     ... and potentially genuinely new commits

   This is demonstrated by running the Git test t9650.

I also disagree that cvsps outputs commits *newer* than T since it will
also output commits *at* T, which is what I changed with the patch in my
previous message.  This fixes the duplicate commit2 in the series above,
but not the duplicate commit3.

> 5. That stream should consist of a set of disconnected branches, each
>    (because of -i) beginning with a root commit containing "from
>    refs/heads/foo^0" which says to parent the commit on the tip of
>    branch foo, whatever that happens to be.  (I don't have to guess
>    about this, I tested the feature before shipping.)
> 
> 6. Now, when git fast-import interprets that stream in the context of
>    the repository produced in step 2, for each branch in the
>    incremental dump the branch root commit is parented on the tip
>    commit of the same branch in the repo.
>  
> At step 6, it shouldn't matter at all which branch is active, because
> where an incremental branch root gets attached has nothing to do with
> which branch is active. 
> 
> It is sufficient to avoid duplicate commits that cvsps -d 0 -d T and
> cvsps -d T run on the same CVS repo operate on *disjoint sets* of CVS
> file commits.  I can see this technique possibly getting confused if T
> falls in the middle of a changeset where the CVS timestamps for the
> file commits are out of order.  But that's the same case that will
> fail if we're importing at file-commit granularity, so there's no new
> bug here.
> 
> Can you explain at what step my logic is incorrect?

Your logic is correct - for cvsps - the problem is where T comes from.

Perhaps it is simplest to just save a CVS_LAST_IMPORT_TIME file in
$GIT_DIR and not worry about it any more.


John
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]