Christian Couder <chriscool@xxxxxxxxxxxxx> writes: > +static struct commit_list *apply_skip_ratio(struct commit_list *list, > + int count, > + int skip_num, int skip_denom) > +{ > + int index, i; > + struct commit_list *cur, *previous; > + > + cur = list; > + previous = NULL; > + index = count * skip_num / skip_denom; > + > + for (i = 0; cur; cur = cur->next, i++) { > + if (i == index) { > + if (hashcmp(cur->item->object.sha1, current_bad_sha1)) > + return cur; > + if (previous) > + return previous; When you return "previous", don't you need to make sure it is not the currently bad one? That is,... > + return list; > + } > + previous = cur; ... shouldn't this be if (hashcmp(cur->item->object.sha1, curret_bad_sha1)) previous = cur; > + } I do not understand the math/logic you are using here. The incoming list is sorted by the "goodness" value, and untestable ones have already been removed. By picking ones that are far from the tip, you are deliberately accepting far suboptimal bisection, but I do not see how you are guaranteeing that making such a trade-off is driving you away from the untestable ones. It almost feels to me that it is no better than randomly picking one from the list, so why can this be an improvement over "pick the one that is testable that appears first in the list that is sorted by the goodness value"? Imagine you have a graph like this (G is good, B is bad, and both branches have similar number of commits): G-------------y--z--a--b--c--B / G--------------f--e--d In the above graph, a and d would give the best bisection, because they can reach about half the number of nodes in the entire graph, and the goal of the basic "find_bisection()" is to find a commit that cuts the graph into closest-to-the-equal halves. For this reason, 'a' and 'd' would be near the beginning of the output from find_bisection(find_all) you run when there are untestable commits in your managed_skipped(). Suppose 'd' was already marked untestable, but 'a' is not. And 'd' has slightly better "goodness" value than 'a'. Side note. I do not think the situation should change drastically if 'a' has a better "goodness" value than 'd' has, but your "skipped_first" logic that I already said I do not understand in my earlier comment treats them quite differently, so this example explicitly talks about the case where 'd' is better than 'a'. You do not check 'a' but check somewhere much older, either on the top branch or on the bottom branch. Why? 'b', because it is direct descendant of 'd', is likely to have inherited the unrelated breakage from 'd' that made it untestable. 'c', being its descendant, can also be contaminated with the unrelated breakage, but there is a chance that such an unrelated breakage was fixed there. Similarly, 'e' is likely to share the same unrelated breakage as 'd' has, because it is a direct ancestor. 'f' shares the possibility but has a better chance of being testable than 'e' is. But 'a' does not have any relationship with 'd' and it is unlikely such an unrelated breakage exists there. I _think_ the solution to the "bisection with untestable" problem should be handled as an optimization problem of two goodness values that cannot be directly compared and traded-off: - The point chosen should be close to the mid-point. The closer "weight" value given in do_find_bisection() is to half the number of nodes, the better; - The point chosen should be far from any known untestable commits. We do not currently have function to count distance from untestable commits, nor make a call to such a function after filtering untestable commits, but I think we should. We already have an efficient and good implementation to compute the former criteria (find_bisection). The "badness" value that comes from the latter criteria essentially should be how close, from ancestry point of view, each commit is related to the ones that are known to be untestable. In the picture, 'b' and 'e' are close relatives (they have thick blood relationship with 'd') and worse than 'c' and 'f' but 'a' shouldn't be punished for being married to 'd' that has a bad blood at all. Now coming back to the "skipped_first" topic. Let's use the same graph, but now let's assume neither 'a' nor 'd' is known to be untestable, and also 'a' gives slightly better "goodness" value than 'd' does. Further suppose that 'z' was already known to be untestable this time. Even though 'a' may have a better goodness value, hence it is not skipped, shouldn't we be favoring 'd' over 'a' because 'a' is far likely to be untestable (due to closer blood tie with 'z') than 'd' is, and 'a' and 'd' would give roughly the same "goodness" value? This is why I said I do not understand your skipped_first logic. > +static struct commit_list *managed_skipped(struct commit_list *list, > + struct commit_list **tried) > +{ > + int count, skipped_first; > + int skip_num, skip_denom; > + > + *tried = NULL; > + > + if (!skipped_revs.sha1_nr) > + return list; > + > + if (!skip_num) > + return filter_skipped(list, tried, 0, NULL, NULL); skip_num is uninitialized and then checked here. Is it supposed to be "static int" with implied 0 initialization? > + list = filter_skipped(list, tried, 0, &count, &skipped_first); > + > + if (!skipped_first) > + return list; > + > + /* Use alternatively 1/5, 2/5 and 3/5 as skip ratio. */ > + skip_num = count % 3 + 1; > + skip_denom = 5; > + > + return apply_skip_ratio(list, count, skip_num, skip_denom); > +} -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html