On Thu, May 9, 2013 at 11:37 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >>> What's a good strategy for avoiding breaking those links? >> >> Do not rebase published history. > > All true, but I think we could do a bit "better", although I am > still on the fence if what I am going to suggest in this message is > truly "better". > > Let me idly speculate and think aloud, "what if". > > Imagine that a user runs "git rebase" on a history leading to commit > X to create an alternate, improved history that leads to commit Y. > What if we teach "git rebase" to record, perhaps by default, an > "ours" merge on top of Y that takes the tree state of Y but has X as > its second parent, and "git log" and its family to ignore such an > artificial "ours" merge that records a tree that is identical to one > of its parents, again perhaps by default? "git log" works more or > less in such a way already, but we might want to teach other modes > like --full-history and --simplify-merges to ignore "ours" to hide > such an artificial merge by default, with an audit option to > unignore them. > > The history transfer will not break, as there is a true ancestry > that preserves the superseded history leading to X, while in the > daily use and inspection of the history, such a superseded history > will not bother the user by default. When the user really wants to > see it (e.g. following a stale gitweb link, or with "git log $X"), > such a superseded side history is still there. > > Private history rewriting lets us pretend to be perfect, which is a > major plus in the distributed workflow Git gives us, and such a mode > of operation will defeat that in a big way, which might turn out to > be a major downside, of course. > > Also, rebases and filter branches that are done in order to excise > unwanted objects from the history (committed a password in a file, > anybody?) need a way to turn it off. I started working on something like this a few weeks ago, but eventually came to the conclusion that this information does not belong in the commit graph itself. You have already identified some of the same problems I found, so I will not repeat them. In the end, you either publish everything (including bad things like passwords or dead-ends), or you leave the the rebase history-preservation feature turned off all the time and then forget to turn it on when it really matters. A better approach, I think, would be to enhance the reflogs to the point where they can provide this information in a reliable manner. The Git garbage collector already skips objects mentioned in the reflogs, so "git reflog expire" just needs to learn how to avoid deleting topologically-interesting entries like rebases. For a shared scenario like github, this would prevent the server from expiring published commits and creating broken links. Since Git maintains reflogs for all heads, including those in refs/remotes, this strategy for preserving history also works in a collaborative environment. Each repository remembers what it has seen, including rebases from remotes (which appear as "forced updates"). On the other hand, work-in-progress commits only appear in the local reflogs, and won't appear in other repositories unless someone pulls or pushes them. If it does become necessary to delete some published historical information (like passwords), it is still possible to delete reflog entries by hand. They are not part of the object database, so doing this doesn't break any hashes. -William -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html