Re: [PATCH] lookup_object: split up displacement penalty for hash collisions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Stefan Beller <stefanbeller@xxxxxxxxxxxxxx> writes:

> A little background on hash tables first:
> Consider you want to have the object X, which you'd expect at position
> i, but because that place was already taken by B, it is not found at
> position i, you start looking right of position i to find X until you
> find it.
>
>     index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
>            --------------------------------------------
>     entry   ... |  A  |  B  |  C  |  D  |  E  |  X  | ...
>            --------------------------------------------
>
> Once you have found X at i+4, the commit 9a414486d9f0 (2013-05-01,
> lookup_object: prioritize recently found objects) did an optimization
> and swapped the object B with X, so the placement looks like
>
>     index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
>            --------------------------------------------
>     entry   ... |  A  |  X  |  C  |  D  |  E  |  B  | ...
>            --------------------------------------------
[...]
> If we now want to find X and X is expected at i, we put X to
> the position i and B to the middle position between B and X at D
> and D will go to the old position of X:
>
>     index       | i-1 |  i  | i+1 | i+2 | i+3 | i+4 |
>            --------------------------------------------
>     entry   ... |  A  |  X  |  C  |  B  |  E  |  D  | ...
>            --------------------------------------------
>
> So let's test how it works out:
> 	# running the current git.git master (c1ebd90c832e), repeat 25 times:
> 	perf stat -r 25 -- ./git-rev-list --all --objects >/dev/null
> 	...
> 	5.294512141 seconds time elapsed    ( +-  7.88% )
> 	# Now running with this patch applied:
> 	5.111576725 seconds time elapsed    ( +-  8.17% )
[...]
> However please do check if this patch brings the promised performance
> on your own, as you're likely using different hardware and another
> software setup. Feel free to share your performance differences.

I get this on an i7-M620 laptop from t/perf/p0001-rev-list.sh:

  Test                               HEAD                next                    
  -------------------------------------------------------------------------------
  0001.1: rev-list --all             6.29(6.03+0.22)     6.33(6.06+0.24) +0.6%   
  0001.2: rev-list --all --objects   53.22(52.48+0.54)   54.90(54.15+0.55) +3.2%*
  -------------------------------------------------------------------------------
  Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001

And this on a newer i7-3930K workstation:

  Test                               HEAD                next                    
  -------------------------------------------------------------------------------
  0001.1: rev-list --all             3.86(3.71+0.14)     3.89(3.74+0.14) +0.7%*  
  0001.2: rev-list --all --objects   30.23(29.91+0.29)   30.35(30.04+0.29) +0.4%.
  -------------------------------------------------------------------------------
  Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001

That's using linux.git as a test repository, I figured the numbers were
too small with git.git.

I trust the laptop numbers less because it has far more thermal (and
thus throttling) issues, but the runs do show a significant difference,
though less than you claimed.

> @@ -90,9 +90,27 @@ struct object *lookup_object(const unsigned char *sha1)
>  		 * Move object to where we started to look for it so
>  		 * that we do not need to walk the hash table the next
>  		 * time we look for it.
> +		 * However, we don't want to penalize the the object being
> +		 * moved from the first position, so divide the penalty to
> +		 * two different objects.
>  		 */
> +
> +		/*
> +		 * First make sure i > first, so the middle is really
> +		 * in between the i and first object
> +		 */
> +		if (i < first)
> +			i += obj_hash_size;
> +
> +		middle = (i + first) / 2;
> +		if (i >= obj_hash_size)
> +			i -= obj_hash_size;
> +		if (middle >= obj_hash_size)
> +			middle -= obj_hash_size;
> +
>  		struct object *tmp = obj_hash[i];

Adding all the code above means that this declaration is now in the
middle of things, which is not valid C90.  Please move the declaration
to the beginning of the block, and compile with
-Wdeclaration-after-statement to notice this issue.

> -		obj_hash[i] = obj_hash[first];
> +		obj_hash[i] = obj_hash[middle];
> +		obj_hash[middle] = obj_hash[first];
>  		obj_hash[first] = tmp;
>  	}
>  	return obj;

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]