Re: [PATCH/RFC] File commited with CRLF should roundtrip diff and apply

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 14 Aug 2017 10:37:43 -0700

tboegi@xxxxxx writes:

> From: Torsten Bögershausen <tboegi@xxxxxx>
>
> When a file had been commited with CRLF and core.autocrlf is true,
> the following does not roundtrip, `git apply` fails:
>
> printf "Added line\r\n" >>file &&
> git diff >patch &&
> git checkout -- . &&
> git apply patch

Should this tweak be in effect when we are paying attention to the
contents of the index?  Or does it matter only when we are doing
"git apply" without either "--index" or "--cache"?  

The fact that the new flag that is passed from load_preimage() down
to read_old_data(), the latter of which is only about the "behave as
a better 'patch -p1', ignoring the index" mode, hints that this
should not affect anything when we are paying attention to the
index, but then calling the load_patch_target() helper with a very
generic CR_AT_EOL flag looks a bit misleading, by making it appear
as if the caller is telling all three cases to do the funny CR
business.  I wonder if we instead want to pass the "patch" structure
down the callchain and have _only_ read_old_data() pay attention to
the has_crlf bit in it---that way, it becomes more obvious what is
going on.

I also notice that read_old_data() still passes &the_index to
convert_to_git().  Can we fix convert_to_git() further to allow NULL
as the istate parameter when we tell it not to look at the index ( I
am reading your passing SAFE_CRLF_KEEP and SAFE_CRLF_FALSE as a way
for the caller to tell that the caller knows what it wants already
and does not want covnert_to_git() to look at the index)?

With or without CR in the patch, I do not see any reason for the
codepath from read_old_data() down to the convert API to look at the
index, ever, as read_old_data() is called only when (!state->cached
&& !state->check_index), i.e. when we are working as a better 'patch
-p1' without even having to know that we are in a Git repository, by
load_patch_target().

> diff --git a/apply.c b/apply.c
> index f2d599141d..63455cd65f 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -220,6 +220,7 @@ struct patch {
>  	unsigned int recount:1;
>  	unsigned int conflicted_threeway:1;
>  	unsigned int direct_to_threeway:1;
> +	unsigned int has_crlf:1;
>  	struct fragment *fragments;
>  	char *result;
>  	size_t resultsize;
> @@ -1662,6 +1663,17 @@ static void check_whitespace(struct apply_state *state,
>  	record_ws_error(state, result, line + 1, len - 2, state->linenr);
>  }
>  
> +/* Check if the patch has context lines with CRLF or
> +   the patch wants to remove lines with CRLF */

Style.

> +static void check_old_for_crlf(struct patch *patch, const char *line, int len)
> +{
> +	if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') {
> +		patch->ws_rule |= WS_CR_AT_EOL;
> +		patch->has_crlf = 1;
> +	}
> +}

I am wondering where the discrepancy between the names (old crlf
here, as opposed to "struct patch" that says "has_crlf" implying it
does not care which side between old and new has one) comes from.

Is the intention that you only care about what is expected of the
preimage?  

> @@ -1712,11 +1724,13 @@ static int parse_fragment(struct apply_state *state,
>  			if (!deleted && !added)
>  				leading++;
>  			trailing++;
> +			check_old_for_crlf(patch, line, len);
>  			if (!state->apply_in_reverse &&
>  			    state->ws_error_action == correct_ws_error)
>  				check_whitespace(state, line, len, patch->ws_rule);
>  			break;

If check_old is about "we care about the lines that are supposed to
match what currently exist in the target file to be patched", then
the call to it also must pay attention to apply_in_reverse.  What
appears on '+' lines in a patch will become the "expected old
contents the patched target is supposed to have" when we are running
"apply -R".

>  		case '-':
> +			check_old_for_crlf(patch, line, len);
>  			if (state->apply_in_reverse &&
>  			    state->ws_error_action != nowarn_ws_error)
>  				check_whitespace(state, line, len, patch->ws_rule);

Likewise.

Thanks for working on this; unlike the previous one I commented, I
think this one I can sort of see how this is an approach to fix it.