On 15 August 2017 at 20:17, Stefan Beller <sbeller@xxxxxxxxxx> wrote: > On Tue, Aug 15, 2017 at 10:59 AM, Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote: >> >> >> On 8/15/2017 8:53 AM, Martin Ågren wrote: >>> >>> Using SANITIZE=thread made t3008-ls-files-lazy-init-name-hash.sh hit >>> the potential race below. >>> >>> What seems to happen is, threaded_lazy_init_name_hash ends up using >>> hashmap_add on the_index.dir_hash from two threads in a way that tsan >>> considers racy. While the buckets each have their own mutex, the "size" >>> does not. So it might end up with the wrong (lower) value. The size is >>> used to determine when to resize, but that should be fine, since >>> resizing is turned off due to threading anyway. >> >>> >>> >>> If the size is ever used for something else, the consequences might >>> range from cosmetic error to devastating. I have a "feeling" the size is >>> not used at the time, although maybe it is, in some roundabout way which >>> I'm not seeing. >> >> >> Good catch! Yes, the size field is unguarded. The only references to >> it that I found were used in _add() and _remove() in the need-to-rehash >> check. >> >> Since auto-rehash is turned off, this shouldn't be a problem, but it >> does feel odd to have a possibly incorrect size due to races. >> >> Rather than adding something like (a cross-platform version of) >> InterlockedIncrement(), I'm wondering if it would be better to >> disable/invalidate the size field when auto-rehash is turned off >> so we don't have to worry about it. >> >> Something like this: >> >> >> $ git diff >> diff --git a/hashmap.c b/hashmap.c >> index 9b6a12859b..be97642daa 100644 >> --- a/hashmap.c >> +++ b/hashmap.c >> @@ -205,6 +205,9 @@ void hashmap_add(struct hashmap *map, void *entry) >> ((struct hashmap_entry *) entry)->next = map->table[b]; >> map->table[b] = entry; >> >> + if (map->disallow_rehash) >> + return; >> + >> /* fix size and rehash if appropriate */ >> map->size++; >> if (map->size > map->grow_at) >> @@ -223,6 +226,9 @@ void *hashmap_remove(struct hashmap *map, const void >> *key, const void *keydata) >> *e = old->next; >> old->next = NULL; >> >> + if (map->disallow_rehash) >> + return; >> + > > > Once we have these two checks, the check in rehash itself becomes > redundant (as any code path leading to the check in rehash already > had the check). > > Which leads me to wonder if we want to make the size (in/de)crease > part of the rehash function, which could be > > adjust_size(struct hashmap *map, int n) > > with `n` either +1 or -1 (the increase value). Depending on 'n', we could > compute the newsize for the potential rehash in that function as well. > > Note sure if that is a cleaner internal API. > >> /* fix size and rehash if appropriate */ >> map->size--; >> if (map->size < map->shrink_at) >> diff --git a/hashmap.h b/hashmap.h >> index 7a8fa7fa3d..ec9541b981 100644 >> --- a/hashmap.h >> +++ b/hashmap.h >> @@ -183,7 +183,8 @@ struct hashmap { >> const void *cmpfn_data; >> >> /* total number of entries (0 means the hashmap is empty) */ >> - unsigned int size; >> + /* -1 means size is unknown for threading reasons */ >> + int size; > > This double-encodes the state of disallow_rehash (i.e. if we had > signed size, then the invariant disallow_rehash === (size < 0) > is true, such that we could omit either the flag and just check for > size < 0 or we do not need the negative size as any user would > need to check disallow_rehash first. Not sure which API is harder > to misuse. I'd think just having the size and getting rid of > disallow_rehash might be hard to to reused. (Do you mean "might be hard to be misused"?) One good thing about turning off the size-tracking with threading is that someone who later wants to know the size in a threaded application will not introduce any subtle bugs by misusing size, but will be forced to provide and use some sort of InterlockedIncrement(). When/if that change happens, it would be nice if no-one relied on the value of size to say anything about threading. So it might make sense to have an implementation-independent way of accessing disallow_rehash a.k.a. (size < 0). For example a function hashmap_disallow_rehash(), except that's obviously taken. :-) Maybe the existing function would then be hashmap_set_disallow_rehash(). Oh well.. > >> >> /* >> * tablesize is the allocated size of the hash table. A non-0 value >> @@ -360,6 +361,15 @@ int hashmap_bucket(const struct hashmap *map, unsigned >> int hash); >> static inline void hashmap_disallow_rehash(struct hashmap *map, unsigned >> value) >> { >> map->disallow_rehash = value; >> + if (value) { >> + /* >> + * Assume threaded operations starting, so don't >> + * try to keep size current. >> + */ >> + size = -1; >> + } else { >> + /* TODO count items in all buckets and set size. */ >> + } >> } >> >> /* >> >> >> Jeff