On Mon, 11 Jun 2018, Frank Tetzel wrote:
Hi, can anybody give me a reasonable explanation why the hash function for a simple unordered_set is called twice as much as I would expect. Example: #include <iostream> #include <unordered_set> struct MyHash{ static int numberOfCalls; size_t operator()(int i) const noexcept{ ++numberOfCalls; //return std::hash<int>{}(i); return i; } }; int MyHash::numberOfCalls = 0; int main(){ std::unordered_set<int,MyHash> set; std::cout << MyHash::numberOfCalls << '\n'; set.reserve(1024); std::cout << MyHash::numberOfCalls << '\n'; for(int i=0; i<1024; ++i){ set.emplace(i); } std::cout << MyHash::numberOfCalls << '\n'; return 0; } output: 0 0 2047 It seems to always be (inserted_elements * 2) - 1. When adding 2 elements, it hashes 3 times. When adding 1 element, it only hashes once. What is happening here? Why is the hash calculated twice? What is so special about the last or first element?
https://gcc.gnu.org/onlinedocs/libstdc++/manual/unordered_associative.html#containers.unordered.cache answers part of the question (libc++ on the other hand seems to have chosen always to cache the hash).
The other part is related to how the elements are linked in the hash table, note that the hashes computed are 0 1 0 2 1 3 2 4 3 5 4 6 5 7 6... Reading the code seems best to understand what is going on.
-- Marc Glisse