Re: [PATCH 2/2] packfile: refactor hash search with fanout table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 02.02.2018 um 23:36 schrieb Jonathan Tan:
> Subsequent patches will introduce file formats that make use of a fanout
> array and a sorted table containing hashes, just like packfiles.
> Refactor the hash search in packfile.c into its own function, so that
> those patches can make use of it as well.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
> ---
>   packfile.c    | 19 +++++--------------
>   sha1-lookup.c | 24 ++++++++++++++++++++++++
>   sha1-lookup.h | 21 +++++++++++++++++++++
>   3 files changed, 50 insertions(+), 14 deletions(-)
> 
> diff --git a/packfile.c b/packfile.c
> index 58bdced3b..29f5dc239 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1712,7 +1712,8 @@ off_t find_pack_entry_one(const unsigned char *sha1,
>   {
>   	const uint32_t *level1_ofs = p->index_data;
>   	const unsigned char *index = p->index_data;
> -	unsigned hi, lo, stride;
> +	unsigned stride;
> +	int ret;
>   
>   	if (!index) {
>   		if (open_pack_index(p))
> @@ -1725,8 +1726,6 @@ off_t find_pack_entry_one(const unsigned char *sha1,
>   		index += 8;
>   	}
>   	index += 4 * 256;
> -	hi = ntohl(level1_ofs[*sha1]);
> -	lo = ((*sha1 == 0x0) ? 0 : ntohl(level1_ofs[*sha1 - 1]));
>   	if (p->index_version > 1) {
>   		stride = 20;
>   	} else {
> @@ -1734,17 +1733,9 @@ off_t find_pack_entry_one(const unsigned char *sha1,
>   		index += 4;
>   	}
>   
> -	while (lo < hi) {
> -		unsigned mi = lo + (hi - lo) / 2;
> -		int cmp = hashcmp(index + mi * stride, sha1);
> -
> -		if (!cmp)
> -			return nth_packed_object_offset(p, mi);
> -		if (cmp > 0)
> -			hi = mi;
> -		else
> -			lo = mi+1;
> -	}
> +	ret = bsearch_hash(sha1, level1_ofs, index, stride);
> +	if (ret >= 0)
> +		return nth_packed_object_offset(p, ret);

Going from unsigned to signed int means the patch breaks support for
more than 2G pack entries, which was put with 326bf39677 (Use uint32_t
for all packed object counts.) in 2007.

>   	return 0;
>   }
>   
> diff --git a/sha1-lookup.c b/sha1-lookup.c
> index 4cf3ebd92..d11c5e526 100644
> --- a/sha1-lookup.c
> +++ b/sha1-lookup.c
> @@ -99,3 +99,27 @@ int sha1_pos(const unsigned char *sha1, void *table, size_t nr,
>   	} while (lo < hi);
>   	return -lo-1;
>   }
> +
> +int bsearch_hash(const unsigned char *sha1, const void *fanout_,
> +		 const void *table_, size_t stride)
> +{
> +	const uint32_t *fanout = fanout_;

Why hide the type?  It doesn't make the function more generic.

> +	const unsigned char *table = table_;
> +	int hi, lo;
> +
> +	hi = ntohl(fanout[*sha1]);
> +	lo = ((*sha1 == 0x0) ? 0 : ntohl(fanout[*sha1 - 1]));
> +
> +	while (lo < hi) {
> +		unsigned mi = lo + (hi - lo) / 2;
> +		int cmp = hashcmp(table + mi * stride, sha1);
> +
> +		if (!cmp)
> +			return mi;
> +		if (cmp > 0)
> +			hi = mi;
> +		else
> +			lo = mi + 1;
> +	}
> +	return -lo - 1;
> +}
> diff --git a/sha1-lookup.h b/sha1-lookup.h
> index cf5314f40..3c59e9cb1 100644
> --- a/sha1-lookup.h
> +++ b/sha1-lookup.h
> @@ -7,4 +7,25 @@ extern int sha1_pos(const unsigned char *sha1,
>   		    void *table,
>   		    size_t nr,
>   		    sha1_access_fn fn);
> +
> +/*
> + * Searches for sha1 in table, using the given fanout table to determine the
> + * interval to search, then using binary search. Returns the element index of
> + * the position found if successful, -i-1 if not (where i is the index of the
> + * least element that is greater than sha1).
> + *
> + * Takes the following parameters:
> + *
> + *  - sha1: the hash to search for
> + *  - fanout: a 256-element array of NETWORK-order 32-bit integers; the integer
> + *    at position i represents the number of elements in table whose first byte
> + *    is less than or equal to i
> + *  - table: a sorted list of hashes with optional extra information in between
> + *  - stride: distance between two consecutive elements in table (should be
> + *    GIT_MAX_RAWSZ or greater)
> + *
> + * This function does not verify the validity of the fanout table.
> + */
> +extern int bsearch_hash(const unsigned char *sha1, const void *fanout,
> +			const void *table, size_t stride);
>   #endif
> 

Why not use sha1_pos()?  I guess because it avoids the overhead of the
accessor function, right?  And I wonder how much of difference it makes.

A binary search function for embedded hashes just needs the key, a
pointer to the first hash in the array, the stride and the number of
elements.  It can then be used with or without a fanout table, making it
more versatile.  Just a thought.

René



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux