Re: in_merge_bases() is too expensive for recent "pu" update

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:

> Junio C Hamano <gitster@xxxxxxxxx> writes:
>
>> Thomas Rast <trast@xxxxxxxxxxxxxxx> writes:
>>
>>> At the very least it should be possible to change in_merge_bases() to
>>> not do any of the post-filtering; perhaps like the patch below.
>>
>> I do not recall the details but the post-filtering was added after
>> the initial naive version without it that was posted to the list did
>> not work correctly in corner cases.  I wouldn't doubt that N*(N-1)
>> loop can be optimized to something with log(N), but I offhand find
>> it hard to believe "to not do any" could be correct without seeing
>> more detailed analysis.
>
> If "one on one side, many on the other side" in merge_bases_many()
> confuses any of you in the readers, you can just ignore many-ness.
> Getting merge bases for "one" against many "two"s using
> merge_bases_many() is exactly the same as getting merge bases for
> "one" against a (fictitious) single commit you can make by merging
> all "two"s.
>
> So we can think about the degenerate case of merge base between two
> commits A and B in the following discussion.
>
> A merge-base between A and B is defined to be a commit that is a
> common ancestor of both A and B, and that is not an ancestor of any
> other merge-base between A and B.

Ok, under that definition I really do think that it's "easy".

You have all the pieces here but one:

> Start from A and B.  Follow from B to find 'x' and paint it in blue,
> follow from A to find 'y' and paint it in amber.  Follow from 'x' to
> '1', paint it in blue.  Follow from 'y' to '1', paint it in amber
> but notice that it already is painted in blue.
[...]
>             o-------o
>            /         \
>           /       y---A
>          /       /
>      ---2---z---1---x---B
>          \         /
>           o-------o
[...]
> So we need to notice that '1' and '2' have ancestry relation in
> order to reject '2' as "common but not merge-base".  One way of
> doing so is not to stop at '1' and keep digging (and eventually we
> find that '2' is what we could reach from '1' that already is a
> merge base), but then we will be susceptible to the same kind of
> clock skew issue as the revision traverser.

I think that is *the* way to do it.

And the way to fix the clock skew issue (also in the revision traversal)
is Peff's generation number cache.  Just keep propagating marks, digging
in generation order, until the generations prove that nothing new can
happen.

  [Side note, in reply to an earlier comment in the rev-list thread: I
  agree with Peff's reasoning that a cache is better than a commit
  header.]

The precise termination condition should be:

  There are no more one-colored commits to walk, and

  The maximum generation of the remaining commits in the walk is less
  than the minimum generation of the merge base candidates

Then the generation numbers ensure that no commit in the walk (which by
then only propagates STALE marks) can possibly be a descendant of any of
the candidates.  At that point, the candidates are the true set of merge
bases.


I conjecture that every history walking problem can be solved in time
linear in the number of commits once we properly use the generation
numbers ;-)

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]