Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 12.03.24 um 09:05 schrieb Jeff King:
> On Thu, Mar 07, 2024 at 08:42:22PM +0100, René Scharfe wrote:
>
>>> @@ -2562,7 +2562,7 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
>>>  	/* left-trim */
>>>  	bol += strspn(bol, " \t");
>>>
>>> -	if (bol == eol || *bol == '\r' || *bol == comment_line_char) {
>>> +	if (bol == eol || *bol == '\r' || starts_with_mem(bol, eol - bol, comment_line_str)) {
>>
>> If the strspn() call is safe (which it is, as the caller expects the
>> string to be NUL-terminated) then you could use starts_with() here and
>> avoid the length calculation.  But that would also match
>> comment_line_str values that contain LF, which the _mem version does not
>> and that's better.
>
> I try not to read too much into the use of string functions on what
> otherwise appears to be an unterminated buffer. While in Git it is quite
> often terminated at allocation time (coming from a strbuf, etc) I feel
> like I've fixed a number of out-of-bounds reads simply due to sloppy
> practices. And even if something is correct today, it is easy for it to
> change, since the assumption is made far away from allocation.
>
> So I dunno. Like you said, fewer computations is fewer opportunity to
> mess things up. I don't like the idea of introducing a new hand-grenade
> that might blow up later, but maybe if it's right next to a strspn()
> call that's already a problem, it's not materially making anything
> worse.

Yeah, and my logic was flawed: If the caller somehow guarantees that a
space or tab occurs before eol then the strspn() call is safe.  Its
presence doesn't guarantee NUL termination.  parse_insn_line() would
not be safe to use without that prerequisite, but that's a different
matter..

>>> @@ -882,7 +882,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
>>>
>>>  	/* The first paragraph is the title and cannot be trailers */
>>>  	for (s = buf; s < buf + len; s = next_line(s)) {
>>> -		if (s[0] == comment_line_char)
>>> +		if (starts_with_mem(s, buf + len - s, comment_line_str))
>>>  			continue;
>>>  		if (is_blank_line(s))
>>
>> Another case where starts_with() would be safe to use, as
>> is_blank_line() expects (and gets) a NUL-terminated string, but it would
>> allow matching comment_line_str values that contain LF.
>
> Hmm. Yes, it is a NUL-terminated string always, but the caller has told
> us not to look past end_of_log_message(). I suspect that if there is no
> newline in comment_line_str() it's probably impossible to go past "len"
> (just because the end of the log surely ends with either a NUL or a
> newline). But it feels iffy to me. I dunno.

Same flawed thinking on my part: As long as we're guaranteed a blank
line in the buffer we won't walk past its end.  That doesn't mean we can
assume a NUL is present.  But that's fragile.  The code should use
memchr() instead of strchrnul().

That's not the problem you set out to solve in your series, though, and
you avoid making it worse by respecting the length limit in the code
you change.  #leftoverbits

Keeping track of the remaining length increases code size and adds
opportunities for mistakes.  Not sure how to avoid it, however.  Using
eol instead of len at least avoids subtractions.

tl;dr: Good patch (in v2).

René





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux