Re: Find out on which branch a commit was originally made) (was ANNOUNCE git-what-branch)

Seth Robertson <in-gitvger@xxxxxxxx> · Wed, 22 Sep 2010 19:26:19 -0400

In message <4C9A66AF.5000302@xxxxxxxxx>, Artur Skawina writes:

    This started in a thread about locating dead topic branche

Isn't that pretty easy to do?  `git fsck --unreachable master | grep
commits`?  Post-processing that to assemble branches would seem to be
fairly simple.

But yes, I wanted something completely different.  Something more
like: if a bug was introduced in commit X, what releases or branches
has it contaminated (or more positively, if a feature was introduced,
where was it made available).  The simple case is figuring out on
which branch a commit was originally made.

I was unhappy when I realized that another way code could get out was
through cherry-picks, and that there doesn't seem any non-brute force
(computing checksums of patches for every patch in the tree) method to
discover them.

    Two things make the above trivial history a bit more complicated.
    A) one side-branch can merge another, and build on top of changes that
       are not yet available on 'master'; the result can then appear in master
       via either one or both paths. This is why showing when and how a change
       became visible on every side branch can be interesting.

Quite.  I encountered this a few different ways and even when I fixed
it during the reverse parse, I failed to learn my lesson and it was a
problem during the forward parse.  I think the latest version is
fairly bullet-proof.

    B) when a side branch does not contain any new changes, but is
       made uptodate wrt master, the resulting history could end up
       like this:

     m-> m -> m -> m -> m -> m -> m ->   master
      \           /      \       /
       b -> b -> b        c ->  c ->    side-branch#1

       What happened was -- git "optimized" the simple merge away, turning it
       into a fast-forward, saving one merge commit, but loosing the link
       connecting the 'c' and 'b' parts of 'side-branch#1'.

    Do you (anybody) happen to know a public repo, w/ history as above, ie
    w/ more then one long-lived branch that has seen some fast-forwards?
    I wonder how reliable recovering the missing link would be...

I have a real (non-public, sorry) tree that did something approaching
this:

->m->m->m->m->m---------m
       /     /         /
b->b->b->b->b------b->b->
 \     \     \    /
  t->t->t->t->t->t

However, due to fast-forwarding, it was turned into something like this:

->m->m->m->m->m---------m
       /     /         /
b->?->?->?->?------b->b->
 \     \     \    /
  t->t->t->t->t->t
  b  b  b  b  b  b

I don't think there is any way to figure out what happened given git's
available information.

I was just saying on #git a few hours ago, though, that I think git
needed a tree anonymizing program.  As long as one does not go
overboard, it doesn't seem too difficult.  That probably means I just
have not thought about the problem hard enough.  Of course, it would
only replicate what is, not how you got there.

    And there's no reason why this operation should take ~20 minutes, even
    for the randomly chosen, but real, worst case. But finding a good repo
    to test w/ would take longer than writing the code...

It only takes 8 seconds per test on the linux kernel, which all things
considered is rather fast.  The real problem is that each test is
treated independently.  If someone got the complete history of the
project and built a tree out of it, it would be extremely fast to run
additional tests even ignoring the obvious optimiziations of not
researching known paths.

The question is, will this functionality be needed often enough to
spend the time necessary to optimize it?

					-Seth Robertson
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html