Re: tsan: t3008: hashmap_add touches size from multiple threads

Martin Ågren <martin.agren@xxxxxxxxx> · Tue, 15 Aug 2017 21:21:12 +0200

On 15 August 2017 at 20:48, Stefan Beller <sbeller@xxxxxxxxxx> wrote:
>>>>         /* total number of entries (0 means the hashmap is empty) */
>>>> -       unsigned int size;
>>>> +       /* -1 means size is unknown for threading reasons */
>>>> +       int size;
>>>
>>> This double-encodes the state of disallow_rehash (i.e. if we had
>>> signed size, then the invariant disallow_rehash === (size < 0)
>>> is true, such that we could omit either the flag and just check for
>>> size < 0 or we do not need the negative size as any user would
>>> need to check disallow_rehash first. Not sure which API is harder
>>> to misuse. I'd think just having the size and getting rid of
>>> disallow_rehash might be hard to to reused.
>>
>> (Do you mean "might be hard to be misused"?)
>
> yes, I do.
>
>> One good thing about turning off the size-tracking with threading is
>> that someone who later wants to know the size in a threaded application
>> will not introduce any subtle bugs by misusing size, but will be forced
>> to provide and use some sort of InterlockedIncrement().
>
> agreed.
>
>> When/if that
>> change happens, it would be nice if no-one relied on the value of size
>> to say anything about threading. So it might make sense to have an
>> implementation-independent way of accessing disallow_rehash a.k.a.
>> (size < 0).
>
> Yes, and my point was whether we want to keep disallow_rehash around,
> as when a patch as this is applied, we'd have it encoded twice,
> both size < 0 as well as disallow_rehash set indicate the rehashing
> disabled.
>
> If we were to reduce it to one, we would not have "invalid" state possible
> such as size < 0 and disallow_rehash = 0.

Agreed.

> In the future we may have more options that make size impossible to
> compute efficiently, such that in that case we'd want to know which
> condition lead to it. In that case we'd want to have the flags around.

Good point.

>> For example a function hashmap_disallow_rehash(), except that's
>> obviously taken. :-) Maybe the existing function would then be
>> hashmap_set_disallow_rehash(). Oh well..
>
> Not sure I understand this one.

Sorry. What I meant was, if we drop the disallow_rehash-field, someone
might be tempted to use size < 0 (or size == -1) to answer the question
"is rehashing disallowed?". (Or "am I threaded?" which already is a
question which the hashmap as it is today doesn't know about.)

So instead of looking at "disallow_rehash" one should perhaps be calling
"hashmap_is_disallow_rehash()" or "hashmap_get_disallow_rehash()", which
would be implemented as "return disallow_rehash", or possibly "return
size == -1".

Except such names are, to the best of my understanding, not the Git-way,
so it should be, e.g., "hashmap_disallow_rehash()".

Except ... that name is taken.... So to free that name up, the existing
function should perhaps be renamed "hashmap_set_disallow_rehash()",
again assuming I've picked up the right conventions in my recent
browsing of the Git-code.

The final "Oh well" was a short form of "it began with an observation
which currently has no practical effect, and is slowly turning into a
chain of ideas on how to rebuild the interface".