"Philippe Blain via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > From: Philippe Blain <levraiphilippeblain@xxxxxxxxx> > > The ref-filter API does not correctly handle commit or tag messages that > ... (I won't repeat myself here; see 0/3) > Signed-off-by: Philippe Blain <levraiphilippeblain@xxxxxxxxx> > --- > ref-filter.c | 19 +++++++++++++++++-- > t/t3203-branch-output.sh | 26 +++++++++++++++++++++----- > t/t6300-for-each-ref.sh | 5 +++++ > t/t7004-tag.sh | 7 +++++++ > 4 files changed, 50 insertions(+), 7 deletions(-) > > diff --git a/ref-filter.c b/ref-filter.c > index 79bb5206783..537cc4de42c 100644 > --- a/ref-filter.c > +++ b/ref-filter.c > @@ -1050,10 +1050,18 @@ static char *copy_subject(const char *buf, unsigned long len) > { > char *r = xmemdupz(buf, len); > int i; > + struct strbuf sb = STRBUF_INIT; > > - for (i = 0; i < len; i++) > + strbuf_attach(&sb, r, len, len + 1); > + for (i = 0; i < sb.len; i++) { > if (r[i] == '\n') > r[i] = ' '; > + if (r[i] == '\r') { > + strbuf_remove(&sb, i, 1); > + i -= 1; > + } > + } > + strbuf_detach(&sb, NULL); > return r; > } So, the chosen solution is to only remove CR and do so only in the first paragraph. Even if the second and subsequent paragraphs use CRLF line endings, those CRs are retained. Also, a lone CR in the first paragraph that is not part of CRLF end-of-line marker is lost, but other control characters like "\a" are retained. That sounds like an almost "minimum" change but not quite. The way strbuf is used in the implementation is a bit curious and risky. Currently we do not realloc to shrink a strbuf, but when we start doing so, this code would break because you are relying on the fact that just before calling strbuf_detach(), sb.buf happens to be still the same as r. As the point of the function is "we want to return a copy of what is in buf[0..len] but the input is a (possibly multi-line) paragraph, and we want a single line 'title', so replace end-of-line with a SP", a minimal translation that is more faithful to the intended meaning of the function would be: static char *copy_subject(const char *buf, unsigned long len) { struct strbuf sb = STRBUF_INIT; for (i = 0; i < len; i++) { if (buf[i] == '\r' && i + 1 < len && buf[i + 1] == '\n') continue; /* ignore CR in CRLF */ if (buf[i] == '\n') strbuf_addch(&sb, ' '); else strbuf_addch(&sb, buf[i]); } return strbuf_detach(&sb, NULL); } perhaps? This retains CR in the middle if exists just like BEL in the middle of the line, and uses strbuf in a safe way, I think. > @@ -1184,15 +1192,22 @@ static void find_subpos(const char *buf, > eol = strchrnul(buf, '\n'); > if (*eol) > eol++; > + /* protect against messages that might contain \r\n */ > + if (*eol == '\r') > + eol++; This is quite convoluted. You found a LF and then are hoping to see if the byte after LF is CR (i.e. you are looking for LFCR, not CRLF). > buf = eol; > } > *sublen = buf - *sub; > /* drop trailing newline, if present */ > if (*sublen && (*sub)[*sublen - 1] == '\n') > *sublen -= 1; > + /* protect against commit messages that might contain \r\n */ > + else if (*sublen && (*sub)[*sublen - 1] == '\r') > + *sublen -= 3; /* drop '\r\n\r' */ Yeek. To find CR-LF-CR-LF, you look for CR-LF-CR? You only checked that the previous byte is NOT LF (because you are in else-if, so the previous if must have failed) and you have at least one previous byte that is CR. What gives us OK that *sublen is sufficiently long that we can safely subtract 3 from it (we only checked that it is not 0; who says it is 3 or more in this code???) and the two bytes before the CR we are looking at here are CRLF???