Re: [PATCH v2 00/17] Peace with CRLF

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Junio,

On Wed, 16 Dec 2015, Junio C Hamano wrote:

> I inspected all the callsites of this function to see if it is safe
> to use such an updated logic at these callsites, and did not find
> anything problematic.  I could update strbuf_getline() in place, but
> just to be extra careful, this series instead introduces another
> helper, strbuf_getline_crlf(), that is aware of this CRLF business,
> and convert the ones that are safe to update as we verify.
> 
>  * This series converts only the callers of strbuf_getline() that
>    would misbehave when fed a file with CRLF-terminated lines and
>    use the data with an unwanted CR appended at the end.  With the
>    update the code should work as intended with such a file, without
>    breaking the expected behaviour when working on a file with
>    LF-terminated lines.
> 
>  * Callers of strbuf_getline() that expect to only read from our own
>    output do not have to accommodate CRLF-terminated lines, but they
>    can be updated to do so safely if we do not rely on the ability
>    to express a payload that has CR at the end.  For example, the
>    insn sheet "rebase -i" uses typically ends each line with a
>    commit log summary that we ourselves do not read or use, so even
>    if the end-user on a platform with LF lines deliberately does
>    insert \r at the end of the line and strbuf_gets() removed that
>    \r from the payload, no unexpected behaviour should happen.
> 
>    This series does not touch them, but it may be a good GSoC
>    microproject to convert them to use strbuf_getline_crlf().
> 
>  * Callers of strbuf_getline() that call strbuf_trim() immediately
>    on the result before doing anything, or otherwise have logic that
>    tolerates whitespaces at the end of the line, can continue using
>    strbuf_getline() and will not misbehave on CRLF-terminated lines.
> 
>    This series does not touch them; converting them to use
>    strbuf_getline_crlf() would a good way to document them as
>    dealing with "text".  While doing so, they can also lose their
>    custom logic that happens to make CRLF-terminated lines work.
> 
>  * A caller of strbuf_getline() that wants to only see LF-terminated
>    lines (in other words, "ABC\r\n" must be treated as a line with 4
>    bytes payload on it, A B C and CR), would be broken if we replace
>    it with strbuf_getline_crlf().  This series does not touch them,
>    and no follow-up series should, either.

Thanks for the detailed explanation. I totally agree that (2) would make
for a good micro-project.

While I am pretty convinced that strbuf_getline() that retains a CR makes
no sense, it is obviously the correct thing to prove that rather than
assume it (after all, some caller might exploit strbuf_getline() for its
auto-growing capabilities even on a binary stream when it is known that it
cannot contain a 0x0a).

And please accept my gratitude for tackling this project.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]