Re: [PATCH v2 08/21] read_packed_refs(): read references with minimal copying

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Thu, 21 Sep 2017 09:34:04 +0200

On 09/20/2017 08:27 PM, Jeff King wrote:
> On Tue, Sep 19, 2017 at 08:22:16AM +0200, Michael Haggerty wrote:
> 
>> Instead of copying data from the `packed-refs` file one line at time
>> and then processing it, process the data in place as much as possible.
>>
>> Also, instead of processing one line per iteration of the main loop,
>> process a reference line plus its corresponding peeled line (if
>> present) together.
>>
>> Note that this change slightly tightens up the parsing of the
>> `parse-ref` file. Previously, the parser would have accepted multiple
> 
> s/parse-ref/packed-refs/, I assume

Thanks; will fix.

> The patch itself looks good, though I did notice an interesting tangent.
> 
>> +		if (eof - pos < GIT_SHA1_HEXSZ + 2 ||
>> +		    parse_oid_hex(p, &oid, &p) ||
>> +		    !isspace(*p++))
>> +			die_invalid_line(refs->path, pos, eof - pos);
> 
> I wondered why you didn't just check the output of parse_oid_hex(), and
> included the length check (since in the long run we'd like to get rid of
> uses of the static GIT_SHA1_HEXSZ macro). I imagine the answer is that
> this is an mmap'd buffer, and we can't guarantee that parse_oid_hex()
> wouldn't walk off the end of it.

Yes.

> That's fine for now, but I suspect it may become a problem when we move
> to having a second hash function with a different length. You can't just
> say "it must have as many bytes as the longest hash", because of course
> we could have the shorter hash at the end of the buffer. But we also
> can't say "it must have as many bytes as the shortest hash", because if
> the content implies it's a longer hash, we'd read off the end of the
> buffer.
> 
> I think in the long run we will need a parse_oid_hex() function that
> takes a ptr/len (or start/end) pair.

Yes, that makes sense.

> [...]

Michael