Re: weird diff output?

Jacob Keller <jacob.keller@xxxxxxxxx> · Tue, 29 Mar 2016 16:05:57 -0700

On Tue, Mar 29, 2016 at 11:16 AM, Stefan Beller <sbeller@xxxxxxxxxx> wrote:
> On Tue, Mar 29, 2016 at 10:54 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>>
>>> I thought this is an optimization for C code where you have a diff like:
>>>
>>>     int existingStuff1(..) {
>>>     ...
>>>     }
>>>     +
>>>     + int foo(..) {
>>>     +...
>>>     +}
>>>
>>>     int existingStuff2(...) {
>>>     ...
>>>
>>> Note that the closing '}' could be taken from the method existingStuff1 instead
>>> of correctly closing foo.
>>
>> That is a less optimal output.  Another possible output would be
>> like so:
>>
>>       int existingStuff1(..) {
>>       ...
>>       }
>>
>>      + int foo(..) {
>>      +...
>>      +}
>>      +
>>       int existingStuff2(...) {
>>
>> All three are valid output, and ...
>>
>>> So the correct heuristic really depends on what kind of text we
>>> are diffing.
>>
>> ... this realization is correct.
>>
>> I have a feeling that any heuristic would be correct half of the
>> time, including the ehuristic implemented in the current code.  The
>> readers of patches have inherent bias.  They do not notice when the
>> hunk is formed to match their expectation, but they will notice and
>> remember when they see something less optimal.
>>
>
> We have 3 possible diffs:
> 1) closing brace and newline before the chunk
> 2) newline before, closing brace after the chunk
> 3) closing brace and newline after the chunk
>
> For C code we may want to conclude that 3) is best. (appeals the bias of
> most people) 2 is slightly worse, whereas 1) is absolutely worst.
>
> Now looking at the code Jacob found strange:
>
>>  cat > expect <<EOF
>> + expected results ...
>> + EOF
>> +test_expect_failure  ... '
>> + ...
>> + '
>> +
>> +cat > expect <<EOF
>
> This can be written in two ways:
>
> 1) "cat > expect <<EOF" before the diff chunk
> 2) "cat > expect <<EOF" after the diff chunk
>
> We claim 1) is better than 2).
> This is different from the C code as now we want to have the
> same lines before not after.
>
> To find a heuristic, which appeals both the C code
> and the shell code, we could take the empty line
> as a strong hint for the divider:
>
> 1) determine the amount of diff which is ambiguous, i.e. can
>    go before or after the chunk.
> 2) Does the ambiguous part contain an empty line?
> 3) If not, I have no offer for you, stop.
> 4) divide the ambiguous chunk by the empty line,
> 5) put the lines *after* the empty line in front of the chunk
> 6) put the part before (including) the empty line after the
>    chunk
> 7) Observe output:
>
>>       }
>>
>>      + int foo(..) {
>>      +...
>>      +}
>>      +
>>       int existingStuff2(...) {
>
>> test_expect_failure ... '
>> existing test ...
>> '
>>
>> + cat > expect <<EOF
>> + expected results ...
>> + EOF
>> +test_expect_failure  ... '
>> + ...
>> + '
>> +
>> cat > expect <<EOF
>
> This is what we want in both cases.
> And I would argue it would appease many other kinds of text as well, because
> an empty line is usually a strong indicator for any text that a
> different thing comes along.
> (Other programming languages, such as Java, C++ and any other C like
> language behaves
> that way; even when writing latex figures you'd rather want to break
> at new lines?)
>
> Thanks,
> Stefan

This seems like a good heuristic. Can we think of any examples where
it would produce wildly confusing diffs? I don't think it necessarily
needs to be default but just a possible option when formatting diffs,
much like we already have today.

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html