Re: Consequences of CRLF in index?

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 26 Oct 2017 14:54:51 +0900

Stefan Beller <sbeller@xxxxxxxxxx> writes:

>> diff --git a/diff.c b/diff.c
>> index 6fd288420b..eeca0fd3b8 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -4202,7 +4202,8 @@ void diff_setup_done(struct diff_options *options)
>>
>>         if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
>>             DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
>> -           DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
>> +           DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) ||
>> +           DIFF_XDL_TST(options, IGNORE_CR_AT_EOL))
>
> This highlights another part of the flag macros, that could be made nicer.
> All these flags combined are XDF_WHITESPACE_FLAGS, so this
> if could be written without the macros as
>
>   if (options->xdl_ops & XDF_WHITESPACE_FLAGS)

Yes, and I think the codepath that matters most already uses that
form.  Perhaps it is a good idea to do the rewrite without a macro
(XDF_WHITESPACE_FLAGS is already a macro enough).

> (1<<5) is taken twice now.

Good eyes.  I think we use bits #1-#8 now (bit #0 is vacant, so are
#9-#31).

>> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
>> index 04d7b32e4e..8720e272b9 100644
>> --- a/xdiff/xutils.c
>> +++ b/xdiff/xutils.c
>> @@ -156,6 +156,21 @@ int xdl_blankline(const char *line, long size, long flags)
>>         return (i == size);
>>  }
>>
>> +/*
>> + * Have we eaten everything on the line, except for an optional
>> + * CR at the very end?
>> + */
>> +static int ends_with_optional_cr(const char *l, long s, long i)
>> +{
>> +       if (s && l[s-1] == '\n')
>> +               s--;
>
> so first we cut off the '\n',

> That seems correct after some thought;

I added the "trim the LF at the end" to the beginning of the
function at the last minute to cheat, and it is debatable if it is
entirely correct on an incomplete line.

The byte at the end of line, i.e. l[s-1], could be either '\n' or
something else, and the latter is an incompete line at the end of
the file.  When we trimmed the LF and decremented s, CR at l[s-1]
is the CR in CRLF, which we do want to ignore.  If we didn't, then
what is CR sitting at l[s-1]?  It is a lone CR at the end of file,
not a part of CRLF.  Do we really want to ignore it?

If we take the name of the option "ignore-cr-at-eol" literally, yes,
it is a CR sitting at the end of a line, which happens to be an
incomplete one, so we do want to ignore.  But if we think about the
reason why we are adding the option (i.e. to help conversion between
CRLF and LF), it is somewhat iffy.  The lone CR at the end of file
cannot be something that came from CRLF<->LF conversion, and ignoring
it would hide possible problems in conversion or the original data.

> I might offer
> another easier to understand (for me) solution,
> ...
> Though this seems even more complicated
> after having it written down.

This happens to me quite often and my solution to it is to remove
the alternative I tried to formulate after convincing myself that it
is not that much of an improvement ;-).