Re: [RFC] bisect: Introduce skip-when to automatically skip commits

Olliver Schinagl <oliver@xxxxxxxxxxx> · Wed, 10 Apr 2024 12:39:03 +0200

On 08-04-2024 18:49, Junio C Hamano wrote:
phillip.wood123@xxxxxxxxx writes:

get_terms() wants to read the first line into `term_bad` and the
second line into `term_good` so it makes sense that it uses two
calls to `strbuf_getline()` to do that. It does not want to read
the whole file into a single buffer as we do here.
Right, but I why not use strbuf_getline()?
Because you want the whole file, not just one line as the script name
could potentially contain a newline
It is technically true, but it somehow sounds like an implausible
scenario to me.  The real reason why read_file() is preferrable is
because you do not have to write, and we do not want to see you write,
the whole "open (and handle error), read, chomp, and return" sequence.

I would even suspect that get_terms() is a poorly written
anti-pattern.  If I were adding that function to the system today, I
wouldn't be surprised if I did read_file() the whole thing and
worked in-core to split two items out.

So I've peaked at it, and think something like:

+int bisect_read_terms(const char **read_bad, const char **read_good)
+{
+       struct strbuf sb = STRBUF_INIT;
+
+       if (!strbuf_read_file(&sb, git_path_bisect_terms(), 0)) {
+               *read_bad = "bad";
+               *read_good = "good";
+               return -1;
+       }
+
+       terms = strbuf_split(&sb);
+       *term_bad = strbuf_detach(terms[0], NULL);
+       *term_good = strbuf_detach(terms[1], NULL);
+
+       strbuf_release(&sb);
+
+       return 0;
+}

would do the trick. This function could then be called from 
builtin/bisect.c as well to have a single interface. Right now, there's 
two ways to do the same thing, just because the arguments to the 
function are different, and the body is slightly different, but the same.

Shall I send a MR with this?

If I understand correctly we're encouraging the user to run "git
bisect skip" from the post checkout script. Doesn't that mean we'll
end up with a set of processes that look like

	- git bisect start
	  - post checkout script
             - git bisect skip
               - post checkout script
                 - git bisect skip
                   ...

as the "git bisect start" is waiting for the post checkout script to
finish running, but that script is waiting for "git bisect skip" to
finish running and so on. Each of those processes takes up system
resources, similar to how a recursive function can exhaust the
available stack space by calling itself over and over again.
True.  What such a post-checkout script can do is to only mark the
HEAD as "untestable", just like a run script given to "bisect run"
signals that fact by returnint 125.  And at that point, I doubt it
makes sense to add such a post-checkout script for the purpose of
allowing "bisect skip".

Having said that, a post-checkout script and pre-resume script may
have a huge value in helping those whose tests cannot be automated
(in other words, they cannot do "git bisect run") when they need to
tweak the working tree during bisection.  We all have seen, during a
bisection session that spans a segment of history that has another
bug that affects our test *but* is orthogonal to the bug we are
chasing, that we "cherry-pick --no-commit" the fix for that other
problem inside "git bisect run" script.  It might look something
like

     #!/bin/sh
     if git merge-base --is-ancestor $the_other_bug HEAD
     then
	# we need the fix
	git cherry-pick --no-commit $fix_for_the_other_bug ||
	exit 125
     fi

     make test
     status=$?
     git reset --hard ;# undo the cherry-pick
     exit $status

But to those whose test is not a good match to "git bisect run", if
we had a mechanism to tweak the checked out working tree after the
"bisect next" (which is an internal mechanism that "bisect good",
"bisect bad", and "bisect skip" share to give you the next HEAD and
the working tree to test) checks out the working tree before it
gives the control back to you, we could split the above script into
two parts and throw the "conditionally cherry-pick the fix" part
into that mechanism.  We'd need to have a companion script to "redo
the damage" (the "reset --hard" in the above illustration) if this
were to work seamlessly.  That obviously is totally orthogonal to
what we are discussing in this thread, but may make a good #leftoverbits
material (but not for novices).

While completly orthonogal, I agree; it would be nice to have and 
'abuse' for the bisect-skip usecase. So if we ignore the fact that it 
can be abused for this (which I don't think is a bad thing, it just 
risks the recursive issue Phillip mentioned.

As I'm not familiar with the deeps of bisect, I just use it as a dumb 
simple user, e.g. start; good, bad; I'm not sure the usecase you are 
describing is completly clear to me.

Are you saying 'git bisect run` is great, but not useful in all 
situations, and so in some cases, we want what you just said? Or would 
this also be part of git bisect run?

I've drafted the post-checkout and pre-resume here: 
https://gitlab.com/olliver/git/-/commit/6b5415377600551c0d94a359fd4b8ca7a3678dcf 
where I'm not clear on what the best points are for for the pre/post 
points. I've put the 'pre' bit in the bisect_state function, as that was 
being triggered by many suboptions, but might not be correct based on 
your answer to the above 
(https://gitlab.com/olliver/git/-/commit/6b5415377600551c0d94a359fd4b8ca7a3678dcf#46324e17f99db64a67eb9a5983ffc3a680914ee3_1001_1028). 
The post-checkout part, I've put in bisect_next 
(https://gitlab.com/olliver/git/-/commit/43993fca32f174f1005c7a445887c0ba5c4036b5#46324e17f99db64a67eb9a5983ffc3a680914ee3_672_717) 
which seems to match what you described.

Olliver