Re: tsan: t3008: hashmap_add touches size from multiple threads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 8/15/2017 8:53 AM, Martin Ågren wrote:
Using SANITIZE=thread made t3008-ls-files-lazy-init-name-hash.sh hit
the potential race below.

What seems to happen is, threaded_lazy_init_name_hash ends up using
hashmap_add on the_index.dir_hash from two threads in a way that tsan
considers racy. While the buckets each have their own mutex, the "size"
does not. So it might end up with the wrong (lower) value. The size is
used to determine when to resize, but that should be fine, since
resizing is turned off due to threading anyway.
>
If the size is ever used for something else, the consequences might
range from cosmetic error to devastating. I have a "feeling" the size is
not used at the time, although maybe it is, in some roundabout way which
I'm not seeing.

Good catch!  Yes, the size field is unguarded.  The only references to
it that I found were used in _add() and _remove() in the need-to-rehash
check.

Since auto-rehash is turned off, this shouldn't be a problem, but it
does feel odd to have a possibly incorrect size due to races.

Rather than adding something like (a cross-platform version of)
InterlockedIncrement(), I'm wondering if it would be better to
disable/invalidate the size field when auto-rehash is turned off
so we don't have to worry about it.

Something like this:


$ git diff
diff --git a/hashmap.c b/hashmap.c
index 9b6a12859b..be97642daa 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -205,6 +205,9 @@ void hashmap_add(struct hashmap *map, void *entry)
        ((struct hashmap_entry *) entry)->next = map->table[b];
        map->table[b] = entry;

+       if (map->disallow_rehash)
+               return;
+
        /* fix size and rehash if appropriate */
        map->size++;
        if (map->size > map->grow_at)
@@ -223,6 +226,9 @@ void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)
        *e = old->next;
        old->next = NULL;

+       if (map->disallow_rehash)
+               return;
+
        /* fix size and rehash if appropriate */
        map->size--;
        if (map->size < map->shrink_at)
diff --git a/hashmap.h b/hashmap.h
index 7a8fa7fa3d..ec9541b981 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -183,7 +183,8 @@ struct hashmap {
        const void *cmpfn_data;

        /* total number of entries (0 means the hashmap is empty) */
-       unsigned int size;
+       /* -1 means size is unknown for threading reasons */
+       int size;

        /*
         * tablesize is the allocated size of the hash table. A non-0 value
@@ -360,6 +361,15 @@ int hashmap_bucket(const struct hashmap *map, unsigned int hash); static inline void hashmap_disallow_rehash(struct hashmap *map, unsigned value)
 {
        map->disallow_rehash = value;
+       if (value) {
+               /*
+                * Assume threaded operations starting, so don't
+                * try to keep size current.
+                */
+               size = -1;
+       } else {
+               /* TODO count items in all buckets and set size. */
+       }
 }

 /*


Jeff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux