Re: [PATCH] hashmap: add API to disable item counting when threaded

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Mon, 4 Sep 2017 17:59:43 +0200 (CEST)

Hi Peff,

On Sat, 2 Sep 2017, Jeff King wrote:

> On Sat, Sep 02, 2017 at 01:31:19AM +0200, Johannes Schindelin wrote:
> 
> > > https://public-inbox.org/git/adb37b70139fd1e2bac18bfd22c8b96683ae18eb.1502780344.git.martin.agren@xxxxxxxxx/
> > [...]
> > > +static inline void hashmap_enable_item_counting(struct hashmap *map)
> > > +{
> > > +	void *item;
> > > +	unsigned int n = 0;
> > > +	struct hashmap_iter iter;
> > > +
> > > +	hashmap_iter_init(map, &iter);
> > > +	while ((item = hashmap_iter_next(&iter)))
> > > +		n++;
> > > +
> > > +	map->do_count_items = 1;
> > > +	map->private_size = n;
> > > +}
> > 
> > BTW this made me think that we may have a problem in our code since
> > switching from my original hashmap implementation to the bucket one
> > added in 6a364ced497 (add a hashtable implementation that supports
> > O(1) removal, 2013-11-14): while it is not expected that there are
> > many collisions, the "grow_at" logic still essentially assumes the
> > number of buckets to be equal to the number of hashmap entries.
> 
> I'm confused about what the problem is. If I am reading the code
> correctly, "size" is always the number of elements and "grow_at" is the
> table size times a load factor. Those are the same numbers you'd use to
> decide to grow in an open-address table.
> 
> It's true that this does not take into account the actual number of
> collisions we see (or the average per bucket, or however you want to
> count it). But generally nor do open-address schemes (and certainly our
> other hash tables just use load factor to decide when to grow).

In the worst case, there is only one bucket when the table is grown
already, is all I tried to point out.

Ciao,
Dscho