Re: tsan: t3008: hashmap_add touches size from multiple threads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 8/15/2017 3:21 PM, Martin Ågren wrote:
On 15 August 2017 at 20:48, Stefan Beller <sbeller@xxxxxxxxxx> wrote:
         /* total number of entries (0 means the hashmap is empty) */
-       unsigned int size;
+       /* -1 means size is unknown for threading reasons */
+       int size;

This double-encodes the state of disallow_rehash (i.e. if we had
signed size, then the invariant disallow_rehash === (size < 0)
is true, such that we could omit either the flag and just check for
size < 0 or we do not need the negative size as any user would
need to check disallow_rehash first. Not sure which API is harder
to misuse. I'd think just having the size and getting rid of
disallow_rehash might be hard to to reused.

(Do you mean "might be hard to be misused"?)

yes, I do.

One good thing about turning off the size-tracking with threading is
that someone who later wants to know the size in a threaded application
will not introduce any subtle bugs by misusing size, but will be forced
to provide and use some sort of InterlockedIncrement().

agreed.

When/if that
change happens, it would be nice if no-one relied on the value of size
to say anything about threading. So it might make sense to have an
implementation-independent way of accessing disallow_rehash a.k.a.
(size < 0).

Yes, and my point was whether we want to keep disallow_rehash around,
as when a patch as this is applied, we'd have it encoded twice,
both size < 0 as well as disallow_rehash set indicate the rehashing
disabled.

If we were to reduce it to one, we would not have "invalid" state possible
such as size < 0 and disallow_rehash = 0.

Agreed.

In the future we may have more options that make size impossible to
compute efficiently, such that in that case we'd want to know which
condition lead to it. In that case we'd want to have the flags around.

Good point.

I feel like we're trying to push hashmaps a little beyond
their capability.  I mean the core hashmap code is NOT thread
safe.  The caller is responsible for carefully controlling how
the hashmap is used and whatever locking strategy it wants --
whether it is a single lock on the entire hashmap -- or a set
of partition-specific locks like I created here.  Whatever the
strategy, it is outside of hashmap.[ch].

Perhaps it would be best to just define things as:
* let (size < 0) mean we choose not to compute/track it (without
  saying why).
* keep "disallow_rehash = 1" to mean we do not want automatic
  resizing (without saying why).

Thread-aware callers will set both.

Thread-aware callers (when finished with threaded operations)
can themselves choose whether to compute the correct size and
re-allow rehashing.  And we can add a method to hashmap.c to
re-calculate the size if we want.


In my lazy_init_name_hash() I set "disallow", do the threaded
code, and then unset "disallow" -- mainly to keep the usage
consistent with the non-threaded case.  I could just as easily
set "disallow" and leave it that way -- the question is whether
we care if the hashmap automatically resizes later.  (I don't.)


For example a function hashmap_disallow_rehash(), except that's
obviously taken. :-) Maybe the existing function would then be
hashmap_set_disallow_rehash(). Oh well..

Not sure I understand this one.

Sorry. What I meant was, if we drop the disallow_rehash-field, someone
might be tempted to use size < 0 (or size == -1) to answer the question
"is rehashing disallowed?". (Or "am I threaded?" which already is a
question which the hashmap as it is today doesn't know about.)

So instead of looking at "disallow_rehash" one should perhaps be calling
"hashmap_is_disallow_rehash()" or "hashmap_get_disallow_rehash()", which
would be implemented as "return disallow_rehash", or possibly "return
size == -1".

Except such names are, to the best of my understanding, not the Git-way,
so it should be, e.g., "hashmap_disallow_rehash()".

Except ... that name is taken.... So to free that name up, the existing
function should perhaps be renamed "hashmap_set_disallow_rehash()",
again assuming I've picked up the right conventions in my recent
browsing of the Git-code.

The final "Oh well" was a short form of "it began with an observation
which currently has no practical effect, and is slowly turning into a
chain of ideas on how to rebuild the interface".




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux