Re: [PATCH v3 5/5] sha1_name: Minimize OID comparisons during disambiguation

Jeff King <peff@xxxxxxxx> · Thu, 5 Oct 2017 05:44:18 -0400

On Mon, Oct 02, 2017 at 10:56:51AM -0400, Derrick Stolee wrote:

> +static void find_abbrev_len_for_pack(struct packed_git *p,
> +				     struct min_abbrev_data *mad)
> +{
> +	int match = 0;
> +	uint32_t num, last, first = 0;
> +	struct object_id oid;
> +
> +	open_pack_index(p);
> +	num = p->num_objects;
> +	last = num;
> +	while (first < last) {
> +		uint32_t mid = (first + last) / 2;

This can overflow if we are looking past 2^31. Probably not very common
(things seem to get pretty painful at hundreds of millions of objects).
But it can be written as:

  uint32_t mid = lo + (hi - lo) / 2;

Sadly, this overflow case seems to be present in many of our binary
searches.

> +		const unsigned char *current;
> +		int cmp;
> +
> +		current = nth_packed_object_sha1(p, mid);

nth_packed_object_sha1() has to do some minor computation to come up
with the correct pointer. The normal packfile lookup precomputes the
initial offset and stride outside the loop. I have no idea if that
micro-optimization is actually noticeable, but I thought I'd mention it
as a possible avenue for investigation since you're obviously interested
in squeezing out performance. ;)

-Peff