Re: [PATCH] hashmap: add API to disable item counting when threaded

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 9/2/2017 4:17 AM, Jeff King wrote:
On Sat, Sep 02, 2017 at 01:31:19AM +0200, Johannes Schindelin wrote:
Before anybody can ask for this message to be wrapped in _(...) to be
translateable, let me suggest instead to add the prefix "BUG: ".

Agreed on both (and Jonathan's suggestion to just use BUG()).

will do.  thanks.


+static inline void hashmap_enable_item_counting(struct hashmap *map)
+{
+	void *item;
+	unsigned int n = 0;
+	struct hashmap_iter iter;
+
+	hashmap_iter_init(map, &iter);
+	while ((item = hashmap_iter_next(&iter)))
+		n++;
+
+	map->do_count_items = 1;
+	map->private_size = n;
+}

BTW this made me think that we may have a problem in our code since
switching from my original hashmap implementation to the bucket one added
in 6a364ced497 (add a hashtable implementation that supports O(1) removal,
2013-11-14): while it is not expected that there are many collisions, the
"grow_at" logic still essentially assumes the number of buckets to be
equal to the number of hashmap entries.

I'm confused about what the problem is. If I am reading the code
correctly, "size" is always the number of elements and "grow_at" is the
table size times a load factor. Those are the same numbers you'd use to
decide to grow in an open-address table.

It's true that this does not take into account the actual number of
collisions we see (or the average per bucket, or however you want to
count it). But generally nor do open-address schemes (and certainly our
other hash tables just use load factor to decide when to grow).

Am I missing something?

-Peff


Hashmap is not thread-safe by itself.  There are several uses of
it in a threaded context and they all handle their own locking
before accessing the hashmap.  Those usually work by locking the
whole hashmap.

My changes in "lazy-init-name-hash" deviated from that pattern
by locking on individual hash chains.  That is, n locks each
controlling 1/nth of the chains.

https://public-inbox.org/git/1490202865-31325-1-git-send-email-git@xxxxxxxxxxxxxxxxx/

To do that I had to disable automatic rehashing for the duration
of my threaded computation.  The problem that TSan identified is
that "size" is always incremented during inserts and it doesn't
have any locks protecting it.  So even though auto-rehash was
disabled, we are still counting the number of items in the map.
Not a terrible problem, but still a race.

Jeff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux