Re: tsan: t3008: hashmap_add touches size from multiple threads

Stefan Beller <sbeller@xxxxxxxxxx> · Tue, 15 Aug 2017 11:17:05 -0700

On Tue, Aug 15, 2017 at 10:59 AM, Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote:
>
>
> On 8/15/2017 8:53 AM, Martin Ågren wrote:
>>
>> Using SANITIZE=thread made t3008-ls-files-lazy-init-name-hash.sh hit
>> the potential race below.
>>
>> What seems to happen is, threaded_lazy_init_name_hash ends up using
>> hashmap_add on the_index.dir_hash from two threads in a way that tsan
>> considers racy. While the buckets each have their own mutex, the "size"
>> does not. So it might end up with the wrong (lower) value. The size is
>> used to determine when to resize, but that should be fine, since
>> resizing is turned off due to threading anyway.
>
>>
>>
>> If the size is ever used for something else, the consequences might
>> range from cosmetic error to devastating. I have a "feeling" the size is
>> not used at the time, although maybe it is, in some roundabout way which
>> I'm not seeing.
>
>
> Good catch!  Yes, the size field is unguarded.  The only references to
> it that I found were used in _add() and _remove() in the need-to-rehash
> check.
>
> Since auto-rehash is turned off, this shouldn't be a problem, but it
> does feel odd to have a possibly incorrect size due to races.
>
> Rather than adding something like (a cross-platform version of)
> InterlockedIncrement(), I'm wondering if it would be better to
> disable/invalidate the size field when auto-rehash is turned off
> so we don't have to worry about it.
>
> Something like this:
>
>
> $ git diff
> diff --git a/hashmap.c b/hashmap.c
> index 9b6a12859b..be97642daa 100644
> --- a/hashmap.c
> +++ b/hashmap.c
> @@ -205,6 +205,9 @@ void hashmap_add(struct hashmap *map, void *entry)
>         ((struct hashmap_entry *) entry)->next = map->table[b];
>         map->table[b] = entry;
>
> +       if (map->disallow_rehash)
> +               return;
> +
>         /* fix size and rehash if appropriate */
>         map->size++;
>         if (map->size > map->grow_at)
> @@ -223,6 +226,9 @@ void *hashmap_remove(struct hashmap *map, const void
> *key, const void *keydata)
>         *e = old->next;
>         old->next = NULL;
>
> +       if (map->disallow_rehash)
> +               return;
> +

Once we have these two checks, the check in rehash itself becomes
redundant (as any code path leading to the check in rehash already
had the check).

Which leads me to wonder if we want to make the size (in/de)crease
part of the rehash function, which could be

    adjust_size(struct hashmap *map, int n)

with `n` either +1 or -1 (the increase value). Depending on 'n', we could
compute the newsize for the potential rehash in that function as well.

Note sure if that is a cleaner internal API.

>         /* fix size and rehash if appropriate */
>         map->size--;
>         if (map->size < map->shrink_at)
> diff --git a/hashmap.h b/hashmap.h
> index 7a8fa7fa3d..ec9541b981 100644
> --- a/hashmap.h
> +++ b/hashmap.h
> @@ -183,7 +183,8 @@ struct hashmap {
>         const void *cmpfn_data;
>
>         /* total number of entries (0 means the hashmap is empty) */
> -       unsigned int size;
> +       /* -1 means size is unknown for threading reasons */
> +       int size;

This double-encodes the state of disallow_rehash (i.e. if we had
signed size, then the invariant disallow_rehash === (size < 0)
is true, such that we could omit either the flag and just check for
size < 0 or we do not need the negative size as any user would
need to check disallow_rehash first. Not sure which API is harder
to misuse. I'd think just having the size and getting rid of
disallow_rehash might be hard to to reused.

>
>         /*
>          * tablesize is the allocated size of the hash table. A non-0 value
> @@ -360,6 +361,15 @@ int hashmap_bucket(const struct hashmap *map, unsigned
> int hash);
>  static inline void hashmap_disallow_rehash(struct hashmap *map, unsigned
> value)
>  {
>         map->disallow_rehash = value;
> +       if (value) {
> +               /*
> +                * Assume threaded operations starting, so don't
> +                * try to keep size current.
> +                */
> +               size = -1;
> +       } else {
> +               /* TODO count items in all buckets and set size. */
> +       }
>  }
>
>  /*
>
>
> Jeff