Re: in_merge_bases() is too expensive for recent "pu" update

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:

> Thomas Rast <trast@xxxxxxxxxxxxxxx> writes:
>
>> At the very least it should be possible to change in_merge_bases() to
>> not do any of the post-filtering; perhaps like the patch below.
>
> I do not recall the details but the post-filtering was added after
> the initial naive version without it that was posted to the list did
> not work correctly in corner cases.  I wouldn't doubt that N*(N-1)
> loop can be optimized to something with log(N), but I offhand find
> it hard to believe "to not do any" could be correct without seeing
> more detailed analysis.

If "one on one side, many on the other side" in merge_bases_many()
confuses any of you in the readers, you can just ignore many-ness.
Getting merge bases for "one" against many "two"s using
merge_bases_many() is exactly the same as getting merge bases for
"one" against a (fictitious) single commit you can make by merging
all "two"s.

So we can think about the degenerate case of merge base between two
commits A and B in the following discussion.

A merge-base between A and B is defined to be a commit that is a
common ancestor of both A and B, and that is not an ancestor of any
other merge-base between A and B.  In the simplest case (illustrated
below), 1 is a merge base between A and B, but 2 is not (even though
it is an ancestor of both A and B, it is an ancestor of 1 which is a
merge base).

                  y---A
                 /
     ---2---o---1---x---B

So, the thinking goes, in order to find all merge bases, first we
can traverse from A and B, following "parent" link from children,
and painting found parents in two colors.

Start from A and B.  Follow from B to find 'x' and paint it in blue,
follow from A to find 'y' and paint it in amber.  Follow from 'x' to
'1', paint it in blue.  Follow from 'y' to '1', paint it in amber
but notice that it already is painted in blue.  Stop the traversal
there and we found a candidate '1' that could be a merge base.  We
know digging from '1' will not find more merge bases, so we should
stop there (I do not recall offhand if the current code does stop
there, though) [*1*].

There may be other paths that are not depicted in the above
illustration that reach '2' from A and B without passing '1'
through.

            o-------o
           /         \
          /       y---A
         /       /
     ---2---z---1---x---B
         \         /
          o-------o

In such a history, we may stop after finding '1' and '2' in the
first pass of "stop when a node is painted in both amber and blue".
This way, the first pass will find _all_ merge bases, but it also
may find common ancestors that are not merge bases.

So we need to notice that '1' and '2' have ancestry relation in
order to reject '2' as "common but not merge-base".  One way of
doing so is not to stop at '1' and keep digging (and eventually we
find that '2' is what we could reach from '1' that already is a
merge base), but then we will be susceptible to the same kind of
clock skew issue as the revision traverser.  Instead, merge-base
traverser chose to do this by running the same "stop traversing at
common" traverser between the candidates (in this case, '1' and
'2').

The objective of this second traversal is very different from the
first one, though.  We do not need _all_ the merge bases between '1'
and '2'; we do not even need merge bases.

The only thing we need to prove that '1' is a merge base (i.e. not
an ancestor of any other merge base candidates the first round of
traversal between A and B found) is to do the first round of the
traversal for '1' as "one" and all the other ('2' in this case) as
"two"s; if the first round of such traversal finishes without
painting '1' in both colors, that would mean '1' is not reachable
from any other candidates of merge base between A and B, so we have
proven that '1' is a merge base.

So I suspect that the postprocess phase could be made from N*(N-1)
to N (run merge_bases_many() once for each of the common ancestor
the first round found).  You might also be able to stop the
traversal used in the first phase (i.e. get_merge_bases()) a bit
earlier, if we are digging through '1' to paint 'z' (and eventually
'2') in STALE (amber+blue) color, as digging further only to paint
things in STALE color is not necessary with the postprocess.


[Footnote]

*1* Digging through '2' down may find that other candidate merge
bases we reach by traversing other paths that may not be depicted in
the above illustration, and there may be such paths to reach '1'
from A and B that does not pass '2' through.  That is a possible
alternative way to reject common ancestor that is not merge-base.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]