On Mon, Jan 21, 2013 at 06:28:53AM -0500, Eric S. Raymond wrote: > John Keeping <john@xxxxxxxxxxxxx>: >> But this is nothing more than a sticking plaster that happens to do >> enough in this particular case > > I'm beginning to think that's the best outcome we ever get in this > problem domain... I don't think we can ever get a perfect outcome, but it should be possible to do a little bit better without too much effort. >> - if the Git repository happened to be on >> a different branch, the start date would be wrong and too many or too >> few commits could be output. Git doesn't detect that they commits are >> identical to some that we already have because we're explicitly telling >> it to make a new commit with the specified parent. > > Then I don't understand the actual failure case. Either that or you > don't understand the effect of -i. Have you actually experimented with > it? The reason I suspect you don't understand the feature is that it > shouldn't make any difference to the way -i works which repository branch is > active at the time of the second import. > > Here is how I model what is going on: > > 1. We make commits to multiple branches of a CVS repo up to some given time T. > > 2. We import it, ending up with a collection of git branches all of which > have tip commits dated T or earlier. And *every* commit dated T or earlier > gets copied over. > > 3. We make more commits to the same set of branches in CVS. > > 4. We now run cvsps -d T on the repo. This generates an incremental > fast-import stream describing all CVS commits *newer* than T (see > the cvsps manual page). This is the problem step. There are two scenarios that have problems: 1. If I create a new development branch in my Git repository and commit something to it then git-cvsimport-3 will pass a time to cvsps that is newer than the actual time of the last import, so T is wrong. It may be possible to fix this case purely in git-cvsimport-3. 2. If the branch I have checked out is not the newest CVS branch, then git-cvsimport-3 will pass a value of T that is before the time of the last import. This case is more subtle but it results in unwanted duplicate commits since git-fast-import will just do what it's told and create the new commits. So if we have the following commits: commit1 at time 1 commit2 at time 2 commit3 at time 3 and I call "cvsps -d 2 -i" I end up with the series: commit1 at time 1 commit2 at time 2 commit3 at time 3 commit2 at time 2 - effectively reverting the previous commit commit3 at time 3 - a duplicate ... and potentially genuinely new commits This is demonstrated by running the Git test t9650. I also disagree that cvsps outputs commits *newer* than T since it will also output commits *at* T, which is what I changed with the patch in my previous message. This fixes the duplicate commit2 in the series above, but not the duplicate commit3. > 5. That stream should consist of a set of disconnected branches, each > (because of -i) beginning with a root commit containing "from > refs/heads/foo^0" which says to parent the commit on the tip of > branch foo, whatever that happens to be. (I don't have to guess > about this, I tested the feature before shipping.) > > 6. Now, when git fast-import interprets that stream in the context of > the repository produced in step 2, for each branch in the > incremental dump the branch root commit is parented on the tip > commit of the same branch in the repo. > > At step 6, it shouldn't matter at all which branch is active, because > where an incremental branch root gets attached has nothing to do with > which branch is active. > > It is sufficient to avoid duplicate commits that cvsps -d 0 -d T and > cvsps -d T run on the same CVS repo operate on *disjoint sets* of CVS > file commits. I can see this technique possibly getting confused if T > falls in the middle of a changeset where the CVS timestamps for the > file commits are out of order. But that's the same case that will > fail if we're importing at file-commit granularity, so there's no new > bug here. > > Can you explain at what step my logic is incorrect? Your logic is correct - for cvsps - the problem is where T comes from. Perhaps it is simplest to just save a CVS_LAST_IMPORT_TIME file in $GIT_DIR and not worry about it any more. John -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html