Re: number of calls to hash function in unordered_set

Marc Glisse <marc.glisse@xxxxxxxx> · Mon, 11 Jun 2018 20:24:20 +0200 (CEST)

On Mon, 11 Jun 2018, Frank Tetzel wrote:

Hi,

can anybody give me a reasonable explanation why the hash function for
a simple unordered_set is called twice as much as I would expect.

Example:

#include <iostream>
#include <unordered_set>

struct MyHash{
	static int numberOfCalls;

	size_t operator()(int i) const noexcept{
		++numberOfCalls;
		//return std::hash<int>{}(i);
		return i;
	}
};
int MyHash::numberOfCalls = 0;

int main(){
	std::unordered_set<int,MyHash> set;

	std::cout << MyHash::numberOfCalls << '\n';
	set.reserve(1024);
	std::cout << MyHash::numberOfCalls << '\n';
	for(int i=0; i<1024; ++i){
		set.emplace(i);
	}
	std::cout << MyHash::numberOfCalls << '\n';

	return 0;
}

output:
0
0
2047

It seems to always be (inserted_elements * 2) - 1.
When adding 2 elements, it hashes 3 times. When adding 1 element, it
only hashes once. What is happening here? Why is the hash calculated
twice? What is so special about the last or first element?

https://gcc.gnu.org/onlinedocs/libstdc++/manual/unordered_associative.html#containers.unordered.cache 
answers part of the question (libc++ on the other hand seems to have 
chosen always to cache the hash).

The other part is related to how the elements are linked in the hash 
table, note that the hashes computed are 0 1 0 2 1 3 2 4 3 5 4 6 5 7 6... 
Reading the code seems best to understand what is going on.

--
Marc Glisse