On Wed, Sep 10, 2008 at 12:56:03AM +0200, Stephen R. van den Berg wrote: > The purpose I'd use the origin links for is to manage software projects > that consist of 7 main branches which have branched in (on average) two > year intervals, which never get merged anymore. The only thing that > happens is that there are backports amongst the branches about two per > week. > > The only way to perform the backports is by using cherry-pick. > The history of each backport *is* important though. > Since all the developers who care about the multiple release branches > have all the relevant branches in their repository, the presence of > a origin object is by no means random, it's a certainty. I'd argue that the origin link is a bit too general for your proposed use. One of the problems with the origin link is that it is only a one way pointer. Given a newer commit, you know that it is (somehow) weekly related to a older commit. So your proposed workflow only works if cherry-picks only happen in one direction. That isn't always true, especially in distributed environments where the bugfix might happen on someone else's development branch, and then it gets pulled in, or perhaps rebased in, and you want to know they are related. I would argue the best way to do that is to store (either in the object or in the free-form text area) not the link, which would have to get renumbered but rather the identifier for the bug(s) that this commit fixes. So for example, consider a convention where in the body of the free-form text area, before the Signed-off-by:, Acked-by:, and CC: headers for those projects that use them, we add something like the following: Addresses-Bug: Red_Hat/149480, Sourceforge_Feature/120167 or Addresses-Bug: Debian/432865, Launchpad/203323, Sourceforge_Bug/1926023 Once you have this information, it is not difficult to maintain a berk_db database which maps a particular Bug identifier (i.e., Red_Hat/149480, or Debian/471977, or Launchpad/203323) to a series of commits. The advantage of this scheme is that if a bug has been fixed in multiple branches, you can see the association between two commits in two different branches very easily. Furthermore, you get a link back to the actual bug in one or more bug tracking systems, which the some porcelain program could use to transform into a hot-link which when clicked opens up a browser window to the bug in question. In contrast, using your proposed origin scheme, if the bug was originally created in some development branch, and then cherry picked into two separate maintenance branches, if you don't have the development branch in your repository (maybe for some reason that development branch wasn't kept for some reason), the origin link in the two maintenance branches would point to a non-existent commit ID, and you wouldn't be able to estabish a linkage between them. By using an independent bug identifer as the way of creating the linkage, you're preserving *much* more useful information, and you can reliably establish a relationship between two commits. In terms of your arguments about why free-form is bad, in another message: >- No strict definition of what it means. >- Diverging porcelain implementations making use of the field in ever so > slightly changing ways over the years. This can be a problem regardless of where you store the information. Whether you store it in the free-form text or in the git object header, if you don't make sure it is well-defined, you're in trouble. >- You cannot rely on the field being always available. This is true regardless of where you store it; older versions of git won't store the git origin link, for example, unless you plan to break backwards compatibility with all existing git repositories, which would be a bad idea. :-) One nice thing of using text in free-form text fields is that anyone can enter it without needing a new version of git. The downside is that people could typo the header in some fashion. But that can be dealt with in a newer version of the git porcelain validates the bug identifier and/or checks for obvious spelling mistakes and issues a warning ("Looks like you may have mispelled 'Adresses-Bug'; perhaps you should fix this via git commit --amend?"). In contrast, if you put it in the git object header, there is no possibility of using the field at all until you update to a version of git that supports it. And some developer on your project is using an older version of git when they rebase or cherry-pick a commit, the origin header will be completely lost; but if it is stored in the free-form area, the information will be brought along for the ride for free. >- Automated "renumbering" becomes difficult at best. This is actually one of the reasons why I don't like the origin link. If you use the origin link, it's *still* not obvious whether you should rewrite the commit ID or not. For example, in some workflows, you have two branches pointing to the same commit before you do the rebase, where the rebase will only update the current branch pointer, but there is another branch still pointing at the original series of commits. Worse yet, someone may have done a cherry-pick *before* the rebase. Hence, the only thing you can do is keep *both* commit ID's. This means that over time, you can't get rid of any commit ID's when you do a rebase, which means the number of commit ID's in the origin link will always increase whenever you do a rebase or a cherry-pick. This is why for the use case where you are trying to figure out whether a bug exists in a particular branch, it is ***much*** better to rendevous using a bug identifier; it provides an extra layer of indirection which results in a much more stable identifer that is guaranteed to work. I understand it won't work for those cases where you don't have a bug tracking identifer, but in fact, if you need this functionality at all (and I am not convinced that you do), the ***much*** better approach is to use the same approach as the bug tracking identifier, and add a level of indirection. How would that work in practice? Whenever you create a new commit, create a UUID which is assigned to the patch. This UUID is not modified by git rebase or git cherry pick, and it should be optionally kept or modified on a git commit --amend. Ideally, said UUID would exported via git-format-patch, and imported via git-am, and via systems that use patches, such as guilt or stg. This becomes a handy way of recognizing patches even if they aren't being stored in git --- for example, Andrew Morton's mm patch series. Now, whether you store this UUID in the free-form text area, or in the git object header, in the long run really doesn't matter. You can just as easily have porcelein suppress a line in the free-form text area, as you can have the procelain print the UUID when it is stored in the object header. Yes, it means that you have to maintain a separate database so you can easily find the list of commits that contain a particular UUID, but I suspect you would need this in the case of the origin link concept anyway, since sooner or later some of the more useful uses of said link would require you to be able to find the commits which had origin links to the original commit, which means you would need to create and maintain this database anyway. And the maintenance of this database is purely optional; you only need it if you care about efficiently looking up UUID's, and given "time git log > /dev/null" on the kernel tree only takes six seconds on my laptop, and "git log > /dev/null" only takes 0.148 seconds for e2fsprogs, for many projects you might not even need the database to accelerate lookups via UUID. - Ted -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html