On Tue, Dec 17, 2013 at 3:58 PM, Eric S. Raymond <esr@xxxxxxxxxxx> wrote: > Johan Herland <johan@xxxxxxxxxxx>: >> HOWEVER, this only solves the "cheap" half of the problem. The reason >> people want incremental CVS import, is to avoid having to repeatedly >> convert the ENTIRE CVS history. This means that the CVS exporter must >> learn to start from a given point in the CVS history (identified by >> the above mapping) and then quickly and efficiently convert only the >> "new stuff" without having to consult/convert the rest of the CVS >> history. THIS is the hard part of incremental import. And it is much >> harder for systems like CVS - where the starting point has a broken >> concept of history... > > I know of *no* importer that solves what you call the "deep" part of > the problem. cvsps didn't, cvs-fast-import doesn't, cvs2git doesn't. > All take the easy way out; parse the entire history, and limit what > is emitted in the output stage. Yes, and starting from a non-incremental importer, that's probably the only viable way to approach incrementalism. > Actually, given what I know about delta-file parsing I'd say a "true" > incremental CVS exporter would be so hard that it's really not worth the > bother. The problem is the delta-based history representation. > Trying to interpret that without building a complete set of history > states in the process (which is most of the work a whole-history > exporter does) would be brutally difficult - barely possible in > principle maybe, but I wouldn't care to try it. Agreed, you would either have to re-parse the entire ,v-file, or you would have to store some (probably a lot of) intermediate state that would allow you to resolve deltas of new revisions without having to parse all the old revisions. > It's much more practical to tune up a whole-history exporter so it's > acceptably fast, then do incremental dumping by suppressing part of > the conversion in the output stage. > > cvs-fast-export's benchmark repo is the history of GNU troff. That's > 3057 commits in 1549 master files; when I reran it just now the > whole-history conversion took 49 seconds. That's 3.7K commits a > minute, which is plenty fast enough for anything smaller than (say) > one of the *BSD repositories. Those are impressive numbers, and in that scenario, using a "repurposed" converter (i.e. whole-history converter that has been taught to do incremental output) is undoubtedly the best solution. However, I fear that you underestimate the number of users that want to use Git against CVS repos that are orders of magnitude larger (in both dimensions: #commits and #files) than your example repo. For these repos, running a proper whole-history conversion takes hours - or even days - and working incrementally on top of that is simply out of the question. Obviously, they still need the whole-history converter for the future point in time when they have collected enough motivation/buy-in to migrate the entire project/company to a better VCS, but until then, they want to use Git locally, while enduring CVS on the server. At my previous $DAYJOB, I was one of those people, and I ended up with a two-pronged "solution" to the problem (this is ~5 years ago now, so I'm somewhat fuzzy on the details): 1. Adopt an ad hoc incremental approach for working against the CVS server: Keep a CVS checkout next to my git repo. and maintain a map between corresponding states/commits in CVS and git. When I update from CVS, apply the corresponding patch to the "cvs" branch in my git repo. Rebase my git-based work on top of that, and use "git cvsexportcommit" to propagate my Git work back to CVS. This is crude and hacky as hell, but it provides me a local git-based workflow. 2. Start convincing fellow developers and lobby management about switching away from CVS. We got a discussion started, gained momentum, and eventually I got to spend most of my time preparing and performing the full-history conversion from CVS to git. This happened mostly before cvs2svn grew its cvs2git sibling, so I ended up writing a custom converter for our particular variation of insane and demented CVS practices. Today, I would probably have gone for cvs2git, or your more recent work. But back to my main point: I believe there are two classes of CVS converters, and I have slowly come to believe that they solve two fundamentally different problems. The first problem is "how to faithfully recreate the project history in a different VCS", which is solved by the full-history converters. Case closed. The second problem is somewhat harder to define, but I'll try: "how to allow me to work productively against a CVS server, without having to deal with the icky CVS bits". Compared to the first problem, the parameters differ somewhet: - Conversion/synchronization time must be short to allow me to stay productive and up-to-date with my colleagues. - Correctness of "current state" is very important. I must be sure that my git working tree is identical to its CVS counterpart, so that my git changes can be reproduced in CVS as faithfully as possible. - Correctness of "history" is less important. I can accept a messy/incorrect Git history, since I can always query the CVS server for the "correct" history (whatever that means in a CVS context...). - As a generic CVS user (not the CVS admin) I don't necessarily have direct access to the ,v files stored on the CVS server. Although a full-history converter with fairly stable output can be made to support this second problem for repos up to a certain size, there will probably still be users that want to work incrementally against much bigger repos, and I don't think _any_ full-history-gone-incremental importer will be able to support the biggest repos. Consequently I believe that for these big repos it is _impossible_ to get both fast incremental workflows and a high degree of (historical) correctness. cvsps tried to be all of the above, and failed badly at the correctness criteria. Therefore I support your decision to "shoot it through the head". I certainly also support any work towards making a full-history converter work in an incremental manner, as it will be immensely useful for smaller CVS repos. But at the same time we should realize that it won't be a solution for incrementally working against _large_ CVS repos. Although it should have been made obvious a long time ago, the removal of cvsps has now made it abundantly clear that Git currently provides no way to support the incremental workflow against large CVS repos. Maybe that is ok, and we can ignore that, waiting for the few remaining large CVS repos to die? Or maybe we need a new effort to fill this niche? Something that is NOT based on a full-history converter, and does NOT try to guarantee a history-correct conversion, but that DOES try to guarantee fast and relatively worry-free two-way synchronization against a CVS server. Unfortunately (or fortunately, depending on POV) I have not had to touch CVS in a long while, and I don't see that changing soon, so it is not my itch to scratch. ...Johan -- Johan Herland, <johan@xxxxxxxxxxx> www.herland.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html