On 03.01.2022 22:06, Eric Sunshine wrote:
On Mon, Jan 3, 2022 at 8:19 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
Eric Sunshine <sunshine@xxxxxxxxxxxxxx> writes:
> On Mon, Jan 3, 2022 at 6:34 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> Eric Sunshine <sunshine@xxxxxxxxxxxxxx> writes:
>> > On Mon, Jan 3, 2022 at 9:24 AM Fabian Stelzer <fs@xxxxxxxxxxxx> wrote:
>> >> - trust_size = strcspn(line, "\n");
>> >> + trust_size = strcspn(line, "\n"); /* truncate at LF */
>> >> + if (trust_size && trust_size != strlen(line) &&
>> >> + line[trust_size - 1] == '\r')
>> >> + trust_size--; /* the LF was part of CRLF at the end */
>> >
>> > I may be misunderstanding, but isn't the strlen() unnecessary?
>> >
>> > if (trust_size && line[trust_size] &&
>> > line[trust_size - 1] == '\r')
>> > trust_size--;
>>
>> That changes behaviour when "line" has more than one lines in it.
>> strcspn() finds the first LF, and the posted patch ignores CRLF not
>> at the end of line[]. Your variant feels more correct if the
>> objective is to find the end of the first line (regardless of the
>> choice of the end-of-line convention, either LF or CRLF) and omit
>> the line terminator.
>
> Okay, that makes sense if that's the intention of the patch. Perhaps
> the commit message should mention that `line` might contain multiple
> lines and that it's only interested in the very last LF (unless it's
> already obvious to everyone else, even though it wasn't to me).
I do not think that is the case. strcspn(line, "\n") will stop at
the first one, so unless it is guaranteed that "line" has only one
line in it, the patch as posted is not correct. Your variant
without strlen() feels more correct, as I said.
Okay, sorry for my unclear thinking. The existing code (before this
patch) does indeed seem to be interested only in the first line of
`line`, in which case I agree that the patch's use of strlen() does
not appear to be correct if `line` could ever contain more than one
line.
I guess we need a bit more context for this patch to make sense:
for (line = ssh_principals_out.buf; *line;
line = strchrnul(line + 1, '\n')) {
while (*line == '\n')
line++;
if (!*line)
break;
trust_size = strcspn(line, "\n"); /* truncate at LF */
if (trust_size && trust_size != strlen(line) &&
line[trust_size - 1] == '\r')
trust_size--; /* the LF was part of CRLF at the end */
principal = xmemdupz(line, trust_size);
ssh_principals_out contains the result of the find-principals call which
contains one found principal per line (normally LF, CRLF in some cygwin
setup).
A principal can contain CR as a valid character. This is problematic if CR
is the last char of the principal since we have no way of knowing then if we
are in cygwin with CRLF line endings or another platform using LF and the CR
is the last character of the principal.
Lets leave this rather weird edge case aside for now.
So what we want to do is split the buffer by line, no matter which line
endings are used, and copy the principal without any line ending characters.
The `trust_size != strlen(line)` check was supposed to guard against `line`
having no LF at all and ending with a CR. I think a
`line[trust_size + 1] != '\0'` would work as well.
But since this whole thing is already hard enough to follow i guess it's
better we simply remove it instead of adding checks for the unlikely case we
encounter a broken ssh-keygen. Especially since the effect would only be a
failed signature validation.
We could even remove the `if (trust_size)` condition since this only happens
when `line` begins with LF which is already skipped over a few lines before.
But it's probably better to leave this in just in case the code changes.
Generally I think this is a common enough problem that there should be a
function to split a strbuf by line no matter if LF or CRLF is used. Similar
to strbuf_getline() but to read from a strbuf or maybe even handle this
within pipe_command() when filling the strbuf. Maybe git even has better
code to handle this but i haven't found it yet?