Re: Avoiding broken Gitweb links and deleted objects

William Swanson <swansontec@xxxxxxxxx> · Fri, 10 May 2013 00:34:07 -0700

On Thu, May 9, 2013 at 11:37 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>> What's a good strategy for avoiding breaking those links?
>>
>> Do not rebase published history.
>
> All true, but I think we could do a bit "better", although I am
> still on the fence if what I am going to suggest in this message is
> truly "better".
>
> Let me idly speculate and think aloud, "what if".
>
> Imagine that a user runs "git rebase" on a history leading to commit
> X to create an alternate, improved history that leads to commit Y.
> What if we teach "git rebase" to record, perhaps by default, an
> "ours" merge on top of Y that takes the tree state of Y but has X as
> its second parent, and "git log" and its family to ignore such an
> artificial "ours" merge that records a tree that is identical to one
> of its parents, again perhaps by default?  "git log" works more or
> less in such a way already, but we might want to teach other modes
> like --full-history and --simplify-merges to ignore "ours" to hide
> such an artificial merge by default, with an audit option to
> unignore them.
>
> The history transfer will not break, as there is a true ancestry
> that preserves the superseded history leading to X, while in the
> daily use and inspection of the history, such a superseded history
> will not bother the user by default.  When the user really wants to
> see it (e.g. following a stale gitweb link, or with "git log $X"),
> such a superseded side history is still there.
>
> Private history rewriting lets us pretend to be perfect, which is a
> major plus in the distributed workflow Git gives us, and such a mode
> of operation will defeat that in a big way, which might turn out to
> be a major downside, of course.
>
> Also, rebases and filter branches that are done in order to excise
> unwanted objects from the history (committed a password in a file,
> anybody?) need a way to turn it off.

I started working on something like this a few weeks ago, but
eventually came to the conclusion that this information does not
belong in the commit graph itself. You have already identified some of
the same problems I found, so I will not repeat them. In the end, you
either publish everything (including bad things like passwords or
dead-ends), or you leave the the rebase history-preservation feature
turned off all the time and then forget to turn it on when it really
matters.

A better approach, I think, would be to enhance the reflogs to the
point where they can provide this information in a reliable manner.
The Git garbage collector already skips objects mentioned in the
reflogs, so "git reflog expire" just needs to learn how to avoid
deleting topologically-interesting entries like rebases. For a shared
scenario like github, this would prevent the server from expiring
published commits and creating broken links.

Since Git maintains reflogs for all heads, including those in
refs/remotes, this strategy for preserving history also works in a
collaborative environment. Each repository remembers what it has seen,
including rebases from remotes (which appear as "forced updates"). On
the other hand, work-in-progress commits only appear in the local
reflogs, and won't appear in other repositories unless someone pulls
or pushes them.

If it does become necessary to delete some published historical
information (like passwords), it is still possible to delete reflog
entries by hand. They are not part of the object database, so doing
this doesn't break any hashes.

-William
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html