On Fri, Mar 24, 2006 at 08:38:58AM -0800, Keith Packard wrote: > On Fri, 2006-03-24 at 07:46 -0800, Linus Torvalds wrote: > > > > On Fri, 24 Mar 2006, David Mansfield wrote: > > > > > > Anyway, I'd like to nail down some of the other nagging ancestry/branch point > > > problems if possible. > > > > What I considered doing was to just ignore the branch ancestry that cvsps > > gives us, and instead use whatever branch that is closest (ie generates > > the minimal diff). That's really wrong too (the data just _has_ to be in > > CVS somehow), but I just don't know how CVS handles branches, and it's how > > we'd have to do merges if we were to ever support them (since afaik, the > > merge-back information simply doesn't exists in CVS). > > cvsps is more of a problem than cvs itself. Per-file branch information > is readily available in the ,v files; each version has a list of > branches from that version, and there are even tags marking the names of > them. One issue that I've discovered is when files have differing branch > structure in the same repository. That happens when a branch is created > while files are checked out on different branches. I'm not quite sure > what to do in this case; I've been trying several approaches and none > seem optimal. One remaining plan is to just attach such branches by > date, but that assumes that the first commit along a branch occurs > shortly after the branch is created (which isn't required). > > Of course, this branch information is only created when a change is made > to the file along said branch, so most of the repository will lack > precise branch information for each branch. When you create a child > branch, the files with no commits in the parent branch will never get > branch information, so the child branch will be numbered as if it were a > branch off of the grandparent. Globally, it is possible to reconstruct > the entire branch structure. If that last sentence was a typo then you already know this, but otherwise you may be disappointed to learn that it's not _always_ possible to discern the correct ancestry tree. The simplest counter-example is two branches where each adds one file and no files in common are modified. If A and B both branched off of HEAD and each adds one file, then they should each only have one file. But if B branched from A which branched from HEAD, then B should also have the file that was added to A. (*) However, the information to distinguish these two cases isn't recorded in CVS. I seem to have described this example more fully in the notes I took while writing the patch to cvsps that does the global inferrence you're describing. You _usually_ can make a very good guess, and the more files that are modified, the better you can do. BTW, those notes are still available here: http://www.codesifter.com/cvsps-notes.txt If you end up comparing the ancestry tree discovered by your tool and the tree output by a patched cvsps, I would be very interested in the results. -chris (*) You can distinguish between A->B->head and B->A->head simply by date. - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html