Re: [Bug] git branch -v has problems with carriage returns

Stefan Beller <sbeller@xxxxxxxxxx> · Wed, 31 May 2017 13:58:16 -0700

On Tue, May 30, 2017 at 10:32 PM, Atousa Duprat <atousa.p@xxxxxxxxx> wrote:
> Here is my first attempt at fixing the issue.

Cool you're looking into this. :)

>
> There are two problems in ref-filter.c:
>
> First, copy_subject() has been modified to turn '\n' into a space and
> every other ascii control character to be ignored.
>
> Second, find_subpos() doesn't realize that a line that only contains a
> '\r\n' is a blank line – at least when using crlf convention.
> I have changed things so that a sequence of either '\n' or "\r\n"
> separate the subject from the body of the commit message.
> I am not looking at the crlf setting because it doesn't seem like a
> useful distinction – when one would we ever care for \r\n not to be a
> blank line?  But it could be done...
>
> Both fixes are minimal, but it feels like they are a issues with the
> specific encoding.  Does git mandate ascii or utf-8 commit messages?
> If not, there may be a larger issue here with encodings and line-end
> conventions at the very least in ref-filter.c
> Guidance would be appreciated for how to deal with this issue...
>
> Patch attached.
>

Please read Documentation/SubmittingPatches
(tl;dr:
(a) please sign your patch, read https://developercertificate.org/
(b) if possible please send patches inline instead of attached)

> diff --git a/ref-filter.c b/ref-filter.c
> index 3a640448f..bc573f481 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -836,11 +836,15 @@ static const char *copy_email(const char *buf)
>  static char *copy_subject(const char *buf, unsigned long len)
>  {
>          char *r = xmemdupz(buf, len);
> -        int i;
> +        int i, j;
>
> -        for (i = 0; i < len; i++)
> +        for (i = 0, j = 0; i < len; i++, j++)
>                  if (r[i] == '\n')
> -                        r[i] = ' ';
> +                        r[j] = ' ';
> +                else if (r[i] < 32)
> +                    j--; // skip ascii control characters that are not '\n'

/*
 * Our comment style uses the other way,
 * as it is compatible with more compilers, still.
 */

This seems to solve a different problem than the carriage return
discussed? So it could go into a separate patch.

> +                else r[j] = r[i];
> +        r[j]=0;
>
>          return r;
>  }
> @@ -956,9 +960,12 @@ static void find_subpos(const char *buf, unsigned long sz,
>                          eol++;
>                  buf = eol;
>          }
> +

stray new line?

>          /* skip any empty lines */
>          while (*buf == '\n')
>                  buf++;
> +        while (*buf == '\r' && *(buf+1) == '\n')
> +                buf += 2;

This first skips LF empty lines and then skips CRLF empty
lines. What if they are mixed? I'd think if we extend the
empty line detection we'd want to robust to such as well,
so maybe

    while (*buf == '\r' || *buf == '\n')
        buf++;

Maybe this is a bit too greedy?