Re: [PATCH v2 2/3] bisect: when skipping, choose a commit away from a skipped commit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le Friday 05 June 2009, Junio C Hamano a écrit :
> Christian Couder <chriscool@xxxxxxxxxxxxx> writes:
> > +static struct commit_list *apply_skip_ratio(struct commit_list *list,
> > +					    int count,
> > +					    int skip_num, int skip_denom)
> > +{
> > +	int index, i;
> > +	struct commit_list *cur, *previous;
> > +
> > +	cur = list;
> > +	previous = NULL;
> > +	index = count * skip_num / skip_denom;
> > +
> > +	for (i = 0; cur; cur = cur->next, i++) {
> > +		if (i == index) {
> > +			if (hashcmp(cur->item->object.sha1, current_bad_sha1))
> > +				return cur;
> > +			if (previous)
> > +				return previous;
>
> When you return "previous", don't you need to make sure it is not the
> currently bad one?  That is,...
>
> > +			return list;
> > +		}
> > +		previous = cur;
>
> ... shouldn't this be
>
>                 if (hashcmp(cur->item->object.sha1, curret_bad_sha1))
>                         previous = cur;
>
> > +	}

I think the current bad will always have the worst goodness value so it will 
always be the last item in the list, so it cannot be "previous".

You can try it out with "git rev-list --bisect-all HEAD ^HEAD~10" for 
example, the bad item (HEAD in this example) will always appear 
with "(dist=0)" at the end.

We rely on that in "bisect_next_all" to decide if we found the first bad 
commit. We do:

	bisect_rev = revs.commits->item->object.sha1;
	...

	if (!hashcmp(bisect_rev, current_bad_sha1)) {
		exit_if_skipped_commits(tried, current_bad_sha1);
		printf("%s is first bad commit\n", bisect_rev_hex);
		show_diff_tree(prefix, revs.commits->item);
		/* This means the bisection process succeeded. */
		exit(10);
	}

> I do not understand the math/logic you are using here.  The incoming list
> is sorted by the "goodness" value, and untestable ones have already been
> removed.

Yes.

> By picking ones that are far from the tip, you are deliberately 
> accepting far suboptimal bisection, but I do not see how you are
> guaranteeing that making such a trade-off is driving you away from the
> untestable ones. 

I think picking one commit far from the tip, doens't mean accepting far 
suboptimal bisection. HPA wrote in a recent thread (RFE: "git bisect 
reverse"):

"You don't drop to 1/2 bit of information until x = 0.11 or 0.89"

and I think the worst possible value with my code means x = 0.2 or x = 0.8 
(with the 3/5 ratio).

It's true that when using a ratio to skip away from an untestable one we 
don't check that we don't come near another untestable one. 

> It almost feels to me that it is no better than 
> randomly picking one from the list, so why can this be an improvement
> over "pick the one that is testable that appears first in the list that
> is sorted by the goodness value"?

We rely on the assumption that the probability that there are many 
untestable ones in many different places quite far away from an untestable 
one and from each other is very low.

For example using the following script:

-------------
#!/bin/sh

new_commit() {
    num="$1"
    echo "line for commit $num" >> hello
    git commit -m "commit $num" hello
}

touch hello
git add hello

for val in $(seq 100)
do
  new_commit $val
done
-------------

and then bisecting between the first and last commits gives something like:

$ git bisect start HEAD HEAD~99
Bisecting: 49 revisions left to test after this (roughly 6 steps)
[e65f680ee31729d2db66dbed7eb24ea8c0e8b6c3] commit 50
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[6cc0df3b9762928c90a5ce7ad4d27bf3340983c3] commit 51
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[45387fab3997ca8c449ac419cb401207aa63da4d] commit 71
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[dd20066ed8a287e8105b833911de7509ca5ec339] commit 40
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[8f0c63e6eede9650bb577dba6f3092703409509e] commit 20
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[4809a426fd411f2db43291bc5d206b199ca9e648] commit 30
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[095cc6a1c6f071f3807ca123ab5bcfc133f36f96] commit 61
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[94a00c0de61ed8ddef261bebfbbb00bc6e0a3a1f] commit 82
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[5d1a2332b7d5d409faebb7056e11028f5b7642f4] commit 29
$ git bisect skip
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[ad46dfdfb0a871b13144395efe26607cb63c7f5b] commit 39

So after testing commits 50 and 51 that are near the middle, you test many 
commits far away from both commits 50 and 51 and from each other, and yet 
not too close to 0 or 100.

> Imagine you have a graph like this (G is good, B is bad, and both
> branches have similar number of commits):
>
> 	G-------------y--z--a--b--c--B
>                               /
>         G--------------f--e--d
>
> In the above graph, a and d would give the best bisection, because they
> can reach about half the number of nodes in the entire graph, and the
> goal of the basic "find_bisection()" is to find a commit that cuts the
> graph into closest-to-the-equal halves.  For this reason, 'a' and 'd'
> would be near the beginning of the output from find_bisection(find_all)
> you run when there are untestable commits in your managed_skipped().
>
> Suppose 'd' was already marked untestable, but 'a' is not.  And 'd' has
> slightly better "goodness" value than 'a'.
>
> 	Side note.  I do not think the situation should change drastically
> 	if 'a' has a better "goodness" value than 'd' has, but your
> 	"skipped_first" logic that I already said I do not understand in
> 	my earlier comment treats them quite differently, so this example
> 	explicitly talks about the case where 'd' is better than 'a'.
>
> You do not check 'a' but check somewhere much older, either on the top
> branch or on the bottom branch.  Why?

The reason is that it would make the code more complex to check that we are 
in this case, and that this case may not happen very often (it relies on 
both being near a merge and having branches of the same size), and that we 
don't loose much (see above my reference to what HPA wrote) by testing a 
commit quite far away.

> 'b', because it is direct descendant of 'd', is likely to have inherited
> the unrelated breakage from 'd' that made it untestable.  'c', being its
> descendant, can also be contaminated with the unrelated breakage, but
> there is a chance that such an unrelated breakage was fixed there.
> Similarly, 'e' is likely to share the same unrelated breakage as 'd' has,
> because it is a direct ancestor.  'f' shares the possibility but has a
> better chance of being testable than 'e' is.
>
> But 'a' does not have any relationship with 'd' and it is unlikely such
> an unrelated breakage exists there.
>
> I _think_ the solution to the "bisection with untestable" problem should
> be handled as an optimization problem of two goodness values that cannot
> be directly compared and traded-off:
>
>  - The point chosen should be close to the mid-point.  The closer
> "weight" value given in do_find_bisection() is to half the number of
> nodes, the better;

I am not so sure this is so important.

>  - The point chosen should be far from any known untestable commits.  We
>    do not currently have function to count distance from untestable
>    commits, nor make a call to such a function after filtering untestable
>    commits, but I think we should.

I think this should be the case with my patch series.

> We already have an efficient and good implementation to compute the
> former criteria (find_bisection).
>
> The "badness" value that comes from the latter criteria essentially
> should be how close, from ancestry point of view, each commit is related
> to the ones that are known to be untestable.  In the picture, 'b' and 'e'
> are close relatives (they have thick blood relationship with 'd') and
> worse than 'c' and 'f' but 'a' shouldn't be punished for being married to
> 'd' that has a bad blood at all.
>
> Now coming back to the "skipped_first" topic.  Let's use the same graph,
> but now let's assume neither 'a' nor 'd' is known to be untestable, and
> also 'a' gives slightly better "goodness" value than 'd' does.  Further
> suppose that 'z' was already known to be untestable this time.
>
> Even though 'a' may have a better goodness value, hence it is not
> skipped, shouldn't we be favoring 'd' over 'a' because 'a' is far likely
> to be untestable (due to closer blood tie with 'z') than 'd' is, and 'a'
> and 'd' would give roughly the same "goodness" value?
>
> This is why I said I do not understand your skipped_first logic.

I agree that we could probably analyse the graph much better than what my 
patch series does, but I think that it would be quite complex.

> > +static struct commit_list *managed_skipped(struct commit_list *list,
> > +					   struct commit_list **tried)
> > +{
> > +	int count, skipped_first;
> > +	int skip_num, skip_denom;
> > +
> > +	*tried = NULL;
> > +
> > +	if (!skipped_revs.sha1_nr)
> > +		return list;
> > +
> > +	if (!skip_num)
> > +		return filter_skipped(list, tried, 0, NULL, NULL);
>
> skip_num is uninitialized and then checked here.  Is it supposed to be
> "static int" with implied 0 initialization?

Ooops, I forgot to remove these 2 lines. They were usefull in my previous 
series but they should be removed in this one.

> > +	list = filter_skipped(list, tried, 0, &count, &skipped_first);
> > +
> > +	if (!skipped_first)
> > +		return list;
> > +
> > +	/* Use alternatively 1/5, 2/5 and 3/5 as skip ratio. */
> > +	skip_num = count % 3 + 1;
> > +	skip_denom = 5;
> > +
> > +	return apply_skip_ratio(list, count, skip_num, skip_denom);
> > +}

I will repost the series with the above fix and some added comments.

Thanks,
Christian.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]