Re: [PATCH 04/23] prefix_ref_iterator: don't trim too much

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Wed, 17 May 2017 16:11:15 +0200

On 05/17/2017 02:55 PM, Jeff King wrote:
> On Wed, May 17, 2017 at 02:05:27PM +0200, Michael Haggerty wrote:
> 
>> diff --git a/refs/iterator.c b/refs/iterator.c
>> index bce1f192f7..f33d1b3a39 100644
>> --- a/refs/iterator.c
>> +++ b/refs/iterator.c
>> @@ -292,7 +292,19 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
>>  		if (!starts_with(iter->iter0->refname, iter->prefix))
>>  			continue;
>>  
>> -		iter->base.refname = iter->iter0->refname + iter->trim;
>> +		if (iter->trim) {
>> +			/*
>> +			 * If there wouldn't be at least one character
>> +			 * left in the refname after trimming, skip
>> +			 * over it:
>> +			 */
>> +			if (memchr(iter->iter0->refname, '\0', iter->trim + 1))
>> +				continue;
> 
> It took me a minute to figure out the logic here. You're looking for the
> end-of-string within the trim boundary, which would be an indication
> that the string itself is smaller than the boundary.
> 
> But what if it returns true, and the string really is shorter than the
> trim size? That would mean we pass a size to memchr that is longer than
> the buffer we pass. Is that legal?
> 
> I suspect it's undefined behavior according to the standard, though I'd
> guess in practice it would be fine. But if I'm understanding it
> correctly, this is the same check as:
> 
>   if (strlen(iter->iter0->refname) <= iter->trim)
> 
> which seems a lot more obvious to me and doesn't fall afoul of weird
> standard issues. The only downside I see is that it would read to the
> end of string when yours could stop at iter->trim bytes. I have no idea
> if that would be measurable (it might even be faster because strlen()
> only has one condition to loop on).

You are correct that I chose `memchr()` over `strlen()` to avoid
scanning a POTENTIALLY EXTREMELY LARGE NUMBER OF CHARACTERS past the
trim length, but of course in real life refnames aren't that long and
`strlen()` might actually be faster.

I *think* `memchr()` is technically OK:

> Implementations shall behave as if they read the memory byte by byte
from the beginning of the bytes pointed to by s and stop at the first
occurrence of c (if it is found in the initial n bytes).

But I agree that the `strlen()` version is also easier to read (I
actually had that version first). So I'll change it as you have suggested.

Thanks,
Michael