Re: [PATCH v2 2/3] ref-filter: fix the API to correctly handle CRLF

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Philippe Blain via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Philippe Blain <levraiphilippeblain@xxxxxxxxx>
>
> The ref-filter API does not correctly handle commit or tag messages that
> ...

(I won't repeat myself here; see 0/3)

> Signed-off-by: Philippe Blain <levraiphilippeblain@xxxxxxxxx>
> ---
>  ref-filter.c             | 19 +++++++++++++++++--
>  t/t3203-branch-output.sh | 26 +++++++++++++++++++++-----
>  t/t6300-for-each-ref.sh  |  5 +++++
>  t/t7004-tag.sh           |  7 +++++++
>  4 files changed, 50 insertions(+), 7 deletions(-)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index 79bb5206783..537cc4de42c 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1050,10 +1050,18 @@ static char *copy_subject(const char *buf, unsigned long len)
>  {
>  	char *r = xmemdupz(buf, len);
>  	int i;
> +	struct strbuf sb = STRBUF_INIT;
>  
> -	for (i = 0; i < len; i++)
> +	strbuf_attach(&sb, r, len, len + 1);
> +	for (i = 0; i < sb.len; i++) {
>  		if (r[i] == '\n')
>  			r[i] = ' ';
> +		if (r[i] == '\r') {
> +			strbuf_remove(&sb, i, 1);
> +			i -= 1;
> +		}
> +	}
> +	strbuf_detach(&sb, NULL);
>  	return r;
>  }

So, the chosen solution is to only remove CR and do so only in the
first paragraph.  Even if the second and subsequent paragraphs use
CRLF line endings, those CRs are retained.  Also, a lone CR in the
first paragraph that is not part of CRLF end-of-line marker is lost,
but other control characters like "\a" are retained.  That sounds
like an almost "minimum" change but not quite.

The way strbuf is used in the implementation is a bit curious and
risky.  Currently we do not realloc to shrink a strbuf, but when we
start doing so, this code would break because you are relying on the
fact that just before calling strbuf_detach(), sb.buf happens to be
still the same as r.

As the point of the function is "we want to return a copy of what is
in buf[0..len] but the input is a (possibly multi-line) paragraph,
and we want a single line 'title', so replace end-of-line with a
SP", a minimal translation that is more faithful to the intended
meaning of the function would be:

	static char *copy_subject(const char *buf, unsigned long len)
	{
		struct strbuf sb = STRBUF_INIT;

		for (i = 0; i < len; i++) {
                	if (buf[i] == '\r' &&
			    i + 1 < len && buf[i + 1] == '\n')
                            continue; /* ignore CR in CRLF */

			if (buf[i] == '\n')
				strbuf_addch(&sb, ' ');
			else
				strbuf_addch(&sb, buf[i]);
		}
                return strbuf_detach(&sb, NULL);
	}

perhaps?  This retains CR in the middle if exists just like BEL in
the middle of the line, and uses strbuf in a safe way, I think.

> @@ -1184,15 +1192,22 @@ static void find_subpos(const char *buf,
>  		eol = strchrnul(buf, '\n');
>  		if (*eol)
>  			eol++;
> +		/*  protect against messages that might contain \r\n */
> +		if (*eol == '\r')
> +			eol++;

This is quite convoluted.  You found a LF and then are hoping to see
if the byte after LF is CR (i.e. you are looking for LFCR, not
CRLF).

>  		buf = eol;
>  	}
>  	*sublen = buf - *sub;
>  	/* drop trailing newline, if present */
>  	if (*sublen && (*sub)[*sublen - 1] == '\n')
>  		*sublen -= 1;
> +	/*  protect against commit messages that might contain \r\n */
> +	else if (*sublen && (*sub)[*sublen - 1] == '\r')
> +		*sublen -= 3; /* drop '\r\n\r' */

Yeek.  To find CR-LF-CR-LF, you look for CR-LF-CR?  You only checked
that the previous byte is NOT LF (because you are in else-if, so the
previous if must have failed) and you have at least one previous byte
that is CR.  What gives us OK that *sublen is sufficiently long that
we can safely subtract 3 from it (we only checked that it is not 0;
who says it is 3 or more in this code???) and the two bytes before
the CR we are looking at here are CRLF???



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux