Re: [PATCH] refs.c: get_ref_cache: use a bucket hash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andreas Krey <a.krey@xxxxxx> writes:

> get_ref_cache used a linear list, which obviously is O(n^2).
> Use a fixed bucket hash which just takes a factor of 100000
> (~ 317^2) out of the n^2 - which is enough.
>
> Signed-off-by: Andreas Krey <a.krey@xxxxxx>
> ---
>
> This brings 'git clean -ndx' times down from 17 minutes
> to 11 seconds on one of our workspaces (which accumulated
> a lot of ignored directories).

Nice.

These impressive numbers should go to the commit log message,
together with a bit more numbers to characterise the shape of the
repository that exhibits the problem with the original code.  You
say "a lot of ignored directories", but do you mean directories in
the working tree (which I suppose do not have much to do with the
submodule_ref_caches[])?  I am guessing that the repository has tons
of submodules?  How many is "tons" to make the pain noticeable?

> Actuallly using adaptive
> hashing or other structures seems overkill.

Perhaps _implementing_ these structures only for this codepath may
be overkill, but would it be an overkill to _use_ existing hashmap.c
implementation?  After all, those who wrote the original would have
thought that anything more complex than a linear list would be
overkill, and nobody disagreed until you found that your repository
disagreed with that assumption ;-)

>  refs.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index e23542b..8198d9e 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -982,6 +982,8 @@ struct packed_ref_cache {
>  	struct stat_validity validity;
>  };
>  
> +#define REF_CACHE_HASH 317
> +
>  /*
>   * Future: need to be in "struct repository"
>   * when doing a full libification.
> @@ -996,7 +998,7 @@ static struct ref_cache {
>  	 * is initialized correctly.
>  	 */
>  	char name[1];
> -} ref_cache, *submodule_ref_caches;
> +} ref_cache, *submodule_ref_caches[REF_CACHE_HASH];
>  
>  /* Lock used for the main packed-refs file: */
>  static struct lock_file packlock;
> @@ -1065,18 +1067,19 @@ static struct ref_cache *create_ref_cache(const char *submodule)
>   */
>  static struct ref_cache *get_ref_cache(const char *submodule)
>  {
> -	struct ref_cache *refs;
> +	struct ref_cache *refs, **bucketp;
> +	bucketp = submodule_ref_caches + strhash(submodule) % REF_CACHE_HASH;
>  
>  	if (!submodule || !*submodule)
>  		return &ref_cache;
>  
> -	for (refs = submodule_ref_caches; refs; refs = refs->next)
> +	for (refs = *bucketp; refs; refs = refs->next)
>  		if (!strcmp(submodule, refs->name))
>  			return refs;
>  
>  	refs = create_ref_cache(submodule);
> -	refs->next = submodule_ref_caches;
> -	submodule_ref_caches = refs;
> +	refs->next = *bucketp;
> +	*bucketp = refs;
>  	return refs;
>  }
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]