Re: weird diff output?

Stefan Beller <sbeller@xxxxxxxxxx> · Tue, 29 Mar 2016 11:16:57 -0700

On Tue, Mar 29, 2016 at 10:54 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>
>> I thought this is an optimization for C code where you have a diff like:
>>
>>     int existingStuff1(..) {
>>     ...
>>     }
>>     +
>>     + int foo(..) {
>>     +...
>>     +}
>>
>>     int existingStuff2(...) {
>>     ...
>>
>> Note that the closing '}' could be taken from the method existingStuff1 instead
>> of correctly closing foo.
>
> That is a less optimal output.  Another possible output would be
> like so:
>
>       int existingStuff1(..) {
>       ...
>       }
>
>      + int foo(..) {
>      +...
>      +}
>      +
>       int existingStuff2(...) {
>
> All three are valid output, and ...
>
>> So the correct heuristic really depends on what kind of text we
>> are diffing.
>
> ... this realization is correct.
>
> I have a feeling that any heuristic would be correct half of the
> time, including the ehuristic implemented in the current code.  The
> readers of patches have inherent bias.  They do not notice when the
> hunk is formed to match their expectation, but they will notice and
> remember when they see something less optimal.
>

We have 3 possible diffs:
1) closing brace and newline before the chunk
2) newline before, closing brace after the chunk
3) closing brace and newline after the chunk

For C code we may want to conclude that 3) is best. (appeals the bias of
most people) 2 is slightly worse, whereas 1) is absolutely worst.

Now looking at the code Jacob found strange:

>  cat > expect <<EOF
> + expected results ...
> + EOF
> +test_expect_failure  ... '
> + ...
> + '
> +
> +cat > expect <<EOF

This can be written in two ways:

1) "cat > expect <<EOF" before the diff chunk
2) "cat > expect <<EOF" after the diff chunk

We claim 1) is better than 2).
This is different from the C code as now we want to have the
same lines before not after.

To find a heuristic, which appeals both the C code
and the shell code, we could take the empty line
as a strong hint for the divider:

1) determine the amount of diff which is ambiguous, i.e. can
   go before or after the chunk.
2) Does the ambiguous part contain an empty line?
3) If not, I have no offer for you, stop.
4) divide the ambiguous chunk by the empty line,
5) put the lines *after* the empty line in front of the chunk
6) put the part before (including) the empty line after the
   chunk
7) Observe output:

>       }
>
>      + int foo(..) {
>      +...
>      +}
>      +
>       int existingStuff2(...) {

> test_expect_failure ... '
> existing test ...
> '
>
> + cat > expect <<EOF
> + expected results ...
> + EOF
> +test_expect_failure  ... '
> + ...
> + '
> +
> cat > expect <<EOF

This is what we want in both cases.
And I would argue it would appease many other kinds of text as well, because
an empty line is usually a strong indicator for any text that a
different thing comes along.
(Other programming languages, such as Java, C++ and any other C like
language behaves
that way; even when writing latex figures you'd rather want to break
at new lines?)

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html