Re: [RFC PATCH v5] Implement --first-parent for git rev-list --bisect

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 28 Jun 2018 09:12:30 -0700

Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:

> On Wed, 27 Jun 2018, Junio C Hamano wrote:
>
>> Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:
>> 
>> > 	git rev-list --bisect-all --first-parent F..E >revs &&
>> > 	# only E, e1..e8 should be listed, nothing else
>> > 	test_line_count = 9 revs &&
>> > 	for rev in E e1 e2 e3 e4 e5 e6 e7 e8
>> > 	do
>> > 		grep "^$(git rev-parse $rev) " revs || return
>> > 	done
>> >
>> > I am faster by... a lot. Like, seconds instead of minutes.
>> 
>> I'm fine either way.  I just thought you would not want 9 separate
>> invocations of grep ;-)
>
> I don't.
>
> I like the unnecessary test_commit calls even less ;-)
>
> And yes, we could avoid those `grep` calls, I know. By something as
> convoluted as

I now think you are reading me wrong ;-)  

I think you already established pretty well that it is a good idea
to avoid introducing different history to run the new test on, when
there is perfectly usable one available.  That part, i.e. test_commit,
I do not think we have anything to disagree about.

What I meant by "many separte grep calls" was to contrast these two
approaches:

 * Have one typical output spelled out as "expect", take an output
   from an actual run into "actual", make them comparable and then
   do a compare (which does not use grep; it uses sort in the
   "making comparable" phase)

 * Not have any typical output, take an output from an actual run,
   and have _code_ that inspects the output encode the rule over the
   output (e.g. "these must exist" is inspected with many grep
   invocations)

Two things the "output must have 9 entries, and these 9 must be
mentioned" we see at the beginning of this message does not verify
are (1) exact dist value given to each of these entries and (2) the
order in which these entries appear in the output.  The latter is
something we document, and the test should cover.  The former does
not have to be cast in stone (i.e. I do not think it does not make a
difference to label the edge commits with dist=1 or dist=0 as long
as everything is consistent), but if there is no strong reason to
keep it possible for us to later change how the numbers are assigned,
I am OK if the test cast it in stone.

Another reason (other than "many separate invocation of grep") to
favor the former approach (i.e. use real-looking expected output,
munge it and the real output into comparable forms and then compare)
is that it is easier to see what aspect of the output we care (and
we do not care) about.

It is harder to see the omission of exact dist value and ordering of
entries in the outpu in the latter approach, and more importantly,
know if the omission was deliberate (e.g. it was merely an example)
or a mere mistake.

With "using a real-looking expected output, make it and the actual
output comparable and then compare" approach, the aspects in the
output we choose not to care about will show in the "make them
comparable" munging.  If we do not care the exact dist values, there
would be something like s/dist=[0-9]*/dist=X/ to mask the exact
value before making the two comparable to see that the expect and
the actual files have the same entries.  If we still do care about
the entries are ordered by the dist values, there would be something
that sorts the entries with the actual dist values before doing that
masking to ensure if the order is correct.