Re: Question regarding hash_resize

Akira Yokosawa <akiyks@xxxxxxxxx> · Wed, 9 Jan 2019 07:16:05 +0900



On 2019/01/08 10:39:31 -0800, Paul E. McKenney wrote:
> On Wed, Jan 09, 2019 at 12:35:37AM +0900, Akira Yokosawa wrote:
>> On 2019/01/09 0:28, Paul E. McKenney wrote:
>>> On Tue, Jan 08, 2019 at 09:56:57AM +0800, Junchang Wang wrote:
>>>> On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@xxxxxxxxx> wrote:
>>>>> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
>>>>>> Hi Paul,
>>>>>>
>>>>>> On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
>>>>>>> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I'm reading hash_resize recently, and have a few questions regarding
>>>>>>>> this algorithm. Please take a look if you have time. Any suggestions
>>>>>>>> are warmly welcomed.
>>>>>>>>
>>>>>>>> === Question 1 ===
>>>>>>>> In hash_resize.c : hashtab_lock_mod
>>>>>>>> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
>>>>>>>> 187                 lsp->hbp[1] = NULL;
>>>>>>>> 188                 return;
>>>>>>>> 189         }
>>>>>>>> 190         htp = rcu_dereference(htp->ht_new);
>>>>>>>>
>>>>>>>> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
>>>>>>>> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
>>>>>>>> compilers and hardware from reordering the two unrelated variables,
>>>>>>>> ht_resize_cur and ht_new. Is my understanding correct?
>>>>>>>
>>>>>>> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
>>>>>>> section
>>>>>>
>>>>>> You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
>>>>>> an RCU read-side critical section", don't you?
>>>>>>
>>>>>>>         and there is a synchronize_rcu() between the update to ->ht_new
>>>>>>> and the updates to ->ht_resize_cur.  For more details on how this works,
>>>>>>> please see https://lwn.net/Articles/573497/.
>>>>>>>
>>>>>>> Of course, if you find a code path in which a call to hashtab_lock_mod()
>>>>>>> is invoked outside of an RCU read-side critical section, that would be
>>>>>>> a bug.  (Can you tell me an exception to this rule, that is, a case
>>>>>>> where hashtab_lock_mod() could safely be invoked outside of an RCU
>>>>>>> read-side critical section?)
>>>>>>>
>>>>>>>> === Question 2 ===
>>>>>>>> In hash_resize.c, each time an updater wants to access a bucket, the
>>>>>>>> updater must first acquire the bucket's lock (htb_lock), preventing
>>>>>>>> other updaters accessing the same bucket concurrently. This approach
>>>>>>>> is OK if the linked list of a bucket is relatively short, but for a
>>>>>>>> larger system where linked lists are long enough and the
>>>>>>>> perftest_resize thread is running simultaneously, it could become a
>>>>>>>> potential performance bottleneck. One naive solution is to allow
>>>>>>>> multiple updaters to access the same bucket, only if they don't
>>>>>>>> operate on the same item of the list of this bucket. I wonder if there
>>>>>>>> are any existing works or discussions on this topic?
>>>>>>>
>>>>>>> One approach is to use a hashed array of locks, and to hash a given
>>>>>>> element's address to locate the lock to be used.  Please see
>>>>>>> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
>>>>>>> Needed Locks First"), including Quick Quiz 7.9, for additional details.
>>>>>>>
>>>>>>> Another approach is to use RCU to protect traversals, and locks within the
>>>>>>> linked-list elements themselves.  These locks are conditionally acquired
>>>>>>> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
>>>>>>> them in list order, and the tricks in Quick Quiz 7.9.
>>>>>>>
>>>>>>> Non-blocking synchronization can also be used, but it is often quite a
>>>>>>> bit more complicated.  See for example the split-order list of Shalev
>>>>>>> and Shavit, along with Desnoyers's RCU-protected extension in the
>>>>>>> userspace RCU library.
>>>>>>>
>>>>>>> But it is usually -way- better to just choose a good hash function and
>>>>>>> to increase the number of buckets.  Which is of course one reason for
>>>>>>> having resizable hash tables.  ;-)
>>>>>>>
>>>>>>> But the other techniques can be useful in more complex linked data
>>>>>>> structures, such as graphs, where there is no reasonable way to
>>>>>>> partition the data.  Nevertheless, many people choose to do the
>>>>>>> partitioning anyway, especially on distributed systems.
>>>>>>>
>>>>>>>> === Question 3 ===
>>>>>>>> Chapter Data Structures also discusses other resizable hash tables,
>>>>>>>> namely "Resizable, scalable, concurrent hash tables via relativistic
>>>>>>>> programming" from Josh Triplett, which can save memory footprint by
>>>>>>>> using a single pair of pointers. But my understanding is that
>>>>>>>> perftest_resize.c is unique in that it allows you to rebuild the hash
>>>>>>>> table by utilizing a different hash function, which could be very
>>>>>>>> useful in practice (e.g., to prevent DDoS attack). Other solutions do
>>>>>>>> not share this property. Is my understanding correct? Did I miss any
>>>>>>>> discussions on this topic in perfbook?
>>>>>>>
>>>>>>> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
>>>>>>> (which I use in hash_resize.c) is the only one allowing arbitrary changes
>>>>>>> to hash functions.  I expect that this advantage will become increasingly
>>>>>>> important as security issues become more challenging.  Furthermore, I
>>>>>>> suspect that the pointer-pair approach is faster and more scalable.
>>>>>>> It is certainly simpler.
>>>>>>>
>>>>>>> On the other hand, one advantage of the other two approaches is decreased
>>>>>>> memory consumption.
>>>>>>>
>>>>>>> Another advantage of Josh Triplett's pointer-unzip approach is that
>>>>>>> concurrent updates are (in theory, anyway) not blocked for as long
>>>>>>> by resize operations.  The other edge of this sword is that resizing
>>>>>>> is much slower, given the need to wait for many RCU grace periods.
>>>>>>>
>>>>>>> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
>>>>>>> and Shavit's split-order list is that all operations are non-blocking,
>>>>>>> which can be important on massively overloaded systems, such as one
>>>>>>> might find in cloud computing.
>>>>>>>
>>>>>>>> === Question 4 ===
>>>>>>>> In the current implementation of hash_resize.c, the perftest_resize
>>>>>>>> could block an updater, and vice versa. It seems this is not what we
>>>>>>>> expected. Ideally, they should be allowed to run concurrently, or at
>>>>>>>> least the perftest_resize thread should have lower priority and
>>>>>>>> updaters should never be blocked by the perftest_resize thread. Is
>>>>>>>> that right? I'm very interested in helping improve. Please let me know
>>>>>>>> if you have any suggestions.
>>>>>>>
>>>>>>> In hash_resize.c, an updater is blocked only for the time required to
>>>>>>> redisposition a bucket.  This is a great improvement over blocking
>>>>>>> updaters for the full resize over all buckets.
>>>>>>>
>>>>>>> But yes, it is not hard to do better, for example, periodically dropping
>>>>>>> the old-table lock in hashtab_resize().  This requires a few careful
>>>>>>> adjustments, of course.  Can you tell me what these adjustments are?
>>>>>>>
>>>>>>> Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
>>>>>>> optimizing for the race with hashtab_resize() doesn't make a whole lot
>>>>>>> of sense.  Please see the patch below.  Thoughts?
>>>>>>>
>>>>>>>                                                      Thanx, Paul
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>> commit 737646a9c868d841b32199b52f5569668975953e
>>>>>>> Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
>>>>>>> Date:   Mon Jan 7 10:29:14 2019 -0800
>>>>>>>
>>>>>>>     datastruct/hash: Simplify hashtab_lookup()
>>>>>>>
>>>>>>>     Because resizing leaves the old hash table intact, and because lookups
>>>>>>>     are carried out within RCU read-side critical sections (which prevent
>>>>>>>     a second resizing operation from starting), there is no need for a
>>>>>>>     lookup to search anywhere but in the old hash table.  And in the common
>>>>>>>     case, there is no resize, so there is no new hash table.  Therefore,
>>>>>>>     eliminating the check for resizing speeds things up in the common
>>>>>>>     case.  In addition, this simplifies the code.
>>>>>>>
>>>>>>>     This commit therefore eliminates the ht_get_bucket() function,
>>>>>>>     renames the ht_get_bucket_single() function to ht_get_bucket(),
>>>>>>>     and modifies callers appropriately.
>>>>>>>
>>>>>>>     Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
>>>>>>>
>>>>>>> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
>>>>>>> index 29e05f907200..be4157959b83 100644
>>>>>>> --- a/CodeSamples/datastruct/hash/hash_resize.c
>>>>>>> +++ b/CodeSamples/datastruct/hash/hash_resize.c
>>>>>>> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
>>>>>>>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
>>>>>>>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
>>>>>>>  static struct ht_bucket *                           //\lnlbl{single:b}
>>>>>>> -ht_get_bucket_single(struct ht *htp, void *key, long *b,
>>>>>>> -                     unsigned long *h)
>>>>>>> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
>>>>>>>  {
>>>>>>>      unsigned long hash = htp->ht_gethash(key);
>>>>>>>
>>>>>>> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
>>>>>>>              *h = hash;                              //\lnlbl{single:h}
>>>>>>>      return &htp->ht_bkt[*b];                        //\lnlbl{single:return}
>>>>>>>  }                                                   //\lnlbl{single:e}
>>>>>>> -
>>>>>>> -/* Get hash bucket correesponding to key, accounting for resize. */
>>>>>>> -static struct ht_bucket *                           //\lnlbl{b}
>>>>>>> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
>>>>>>> -{
>>>>>>> -    struct ht_bucket *htbp;
>>>>>>> -
>>>>>>> -    htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
>>>>>>> -                                                            //\fcvexclude
>>>>>>> -    if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {   //\lnlbl{resized}
>>>>>>> -            smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
>>>>>>
>>>>>> If we can remove this memory barrier, the counterpart smp_mb() in
>>>>>> hashtab_resize() becomes unnecessary, doesn't it?
>>>>>
>>>>> And the WRITE_ONCE() in the following line.
>>>>>
>>>>>         Thanks, Akira
>>>>>>
>>>>>>         Thanks, Akira
>>>>>>
>>>>>>> -            *htp = rcu_dereference((*htp)->ht_new); //\lnlbl{newtable}
>>>>>>> -            htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
>>>>>>> -    }
>>>>>>> -    if (i)                                          //\lnlbl{chk_i}
>>>>>>> -            *i = (*htp)->ht_idx;                    //\lnlbl{set_idx}
>>>>>>> -    return htbp;                                    //\lnlbl{return}
>>>>>>> -}                                                   //\lnlbl{e}
>>>>>>>  //\end{snippet}
>>>>>>>
>>>>>>>  /* Read-side lock/unlock functions. */
>>>>>>> @@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
>>>>>>>
>>>>>>>      rcu_read_lock();                                //\lnlbl{l:rcu_lock}
>>>>>>>      htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{l:refhashtbl}
>>>>>>> -    htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:refbucket}
>>>>>>> +    htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:refbucket}
>>>>>>>      spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_bucket}
>>>>>>>      lsp->hbp[0] = htbp;                             //\lnlbl{l:lsp0b}
>>>>>>>      lsp->hls_idx[0] = htp->ht_idx;
>>>>>>> @@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
>>>>>>>              return;                                 //\lnlbl{l:fastret1}
>>>>>>>      }
>>>>>>>      htp = rcu_dereference(htp->ht_new);             //\lnlbl{l:new_hashtbl}
>>>>>>> -    htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:get_newbkt}
>>>>>>> +    htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:get_newbkt}
>>>>>>>      spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_newbkt}
>>>>>>>      lsp->hbp[1] = htbp;                             //\lnlbl{l:lsp1b}
>>>>>>>      lsp->hls_idx[1] = htp->ht_idx;
>>>>>>> @@ -223,16 +204,15 @@ struct ht_elem *                                       //\lnlbl{lkp:b}
>>>>>>>  hashtab_lookup(struct hashtab *htp_master, void *key)
>>>>>>>  {
>>>>>>>      long b;
>>>>>>> -    int i;
>>>>>>>      struct ht *htp;
>>>>>>>      struct ht_elem *htep;
>>>>>>>      struct ht_bucket *htbp;
>>>>>>>
>>>>>>>      htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{lkp:get_curtbl}
>>>>>>> -    htbp = ht_get_bucket(&htp, key, &b, &i);        //\lnlbl{lkp:get_curbkt}
>>>>>>> +    htbp = ht_get_bucket(htp, key, &b, NULL);       //\lnlbl{lkp:get_curbkt}
>>>>>>>      cds_list_for_each_entry_rcu(htep,               //\lnlbl{lkp:loop:b}
>>>>>>>                                  &htbp->htb_head,
>>>>>>> -                                hte_next[i]) {
>>>>>>> +                                hte_next[htp->ht_idx]) {
>>>>>>>              if (htp->ht_cmp(htep, key))             //\lnlbl{lkp:match}
>>>>>>>                      return htep;                    //\lnlbl{lkp:ret_match}
>>>>>>>      }                                               //\lnlbl{lkp:loop:e}
>>>>>>> @@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
>>>>>>>              htbp = &htp->ht_bkt[i];                 //\lnlbl{get_oldcur}
>>>>>>>              spin_lock(&htbp->htb_lock);             //\lnlbl{acq_oldcur}
>>>>>>>              cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
>>>>>>> -                    htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
>>>>>>> +                    htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
>>>>>>>                      spin_lock(&htbp_new->htb_lock);
>>>>>>>                      cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
>>>>>>>                      spin_unlock(&htbp_new->htb_lock);
>>>>>>> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
>>>>>>> index 5c61bf5e2389..0152437c274e 100644
>>>>>>> --- a/datastruct/datastruct.tex
>>>>>>> +++ b/datastruct/datastruct.tex
>>>>>>> @@ -966,10 +966,8 @@ the old table.
>>>>>>>  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
>>>>>>>  Bucket selection is shown in
>>>>>>>  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
>>>>>>> -which shows \co{ht_get_bucket_single()} on
>>>>>>> -lines~\lnref{single:b}-\lnref{single:e} and
>>>>>>> -\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
>>>>>>> -The \co{ht_get_bucket_single()} function returns a reference to the bucket
>>>>>>> +which shows \co{ht_get_bucket()}.
>>>>>>> +This function returns a reference to the bucket
>>>>>>>  corresponding to the specified key in the specified hash table, without
>>>>>>>  making any allowances for resizing.
>>>>>>>  It also stores the bucket index corresponding to the key into the location
>>>>>>> @@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
>>>>>>>  hash value corresponding to the key into the location
>>>>>>>  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
>>>>>>>  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
>>>>>>> -
>>>>>>> -The \co{ht_get_bucket()} function handles hash-table selection, invoking
>>>>>>> -\co{ht_get_bucket_single()} on
>>>>>>> -line~\lnref{call_single} to select the bucket
>>>>>>> -corresponding to the hash in the current
>>>>>>> -hash table, storing the hash value through parameter~\co{b}.
>>>>>>> -If line~\lnref{resized} determines that the table is being resized and that
>>>>>>> -line~\lnref{call_single}'s bucket has already been distributed across the new hash
>>>>>>> -table, then line~\lnref{newtable} selects the new hash table and
>>>>>>> -line~\lnref{newbucket}
>>>>>>> -selects the bucket corresponding to the hash in the new hash table,
>>>>>>> -again storing the hash value through parameter~\co{b}.
>>>>>>> -\end{lineref}
>>>>>>> -
>>>>>>> -\QuickQuiz{}
>>>>>>> -    The code in
>>>>>>> -    Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
>>>>>>> -    computes the hash twice!
>>>>>>> -    Why this blatant inefficiency?
>>>>>>> -\QuickQuizAnswer{
>>>>>>> -    The reason is that the old and new hash tables might have
>>>>>>> -    completely different hash functions, so that a hash computed
>>>>>>> -    for the old table might be completely irrelevant to the
>>>>>>> -    new table.
>>>>>>> -} \QuickQuizEnd
>>>>>>> -
>>>>>>> -\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
>>>>>>> -If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
>>>>>>> -line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
>>>>>>> -Finally, line~\lnref{return} returns a reference to the selected hash bucket.
>>>>>>>  \end{lineref}
>>>>>>>
>>>>>>>  \QuickQuiz{}
>>>>>>> @@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
>>>>>>>      functions described next.
>>>>>>>  } \QuickQuizEnd
>>>>>>>
>>>>>>> -This implementation of
>>>>>>> -\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
>>>>>>> -permit lookups and modifications to run concurrently
>>>>>>> -with a resize operation.
>>>>>>> +This implementation of \co{ht_get_bucket()} permits lookups and
>>>>>>> +modifications to run concurrently with a resize operation.
>>>>>>>
>>>>>>>  \begin{listing}[tb]
>>>>>>>  \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
>>>>>>> @@ -1129,11 +1095,6 @@ hash lookups.
>>>>>>>  Line~\lnref{get_curtbl} fetches the current hash table and
>>>>>>>  line~\lnref{get_curbkt} obtains a reference
>>>>>>>  to the bucket corresponding to the specified key.
>>>>>>> -This bucket will be located in a new resized hash table when a
>>>>>>> -resize operation has progressed past the bucket in the old hash
>>>>>>> -table that contained the desired data element.
>>>>>>> -Note that line~\lnref{get_curbkt} also passes back the index that will be
>>>>>>> -used to select the correct set of pointers from the pair in each element.
>>>>>>>  The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
>>>>>>>  so that if line~\lnref{match}
>>>>>>>  detects a match,
>>>>>>> @@ -1144,22 +1105,17 @@ failure.
>>>>>>>  \end{lineref}
>>>>>>>
>>>>>>>  \QuickQuiz{}
>>>>>>> -    In the \co{hashtab_lookup()} function in
>>>>>>> -    Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
>>>>>>> -    the code carefully finds the right bucket in the new hash table
>>>>>>> -    if the element to be looked up has already been distributed
>>>>>>> -    by a concurrent resize operation.
>>>>>>> -    This seems wasteful for RCU-protected lookups.
>>>>>>> -    Why not just stick with the old hash table in this case?
>>>>>>> +    \begin{lineref}[ln:datastruct:hash_resize:access:lkp]
>>>>>>> +    What if execution reaches line~\lnref{loop:b}
>>>>>>> +    of \co{hashtab_lookup()} in
>>>>>>> +    Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
>>>>>>> +    just after this bucket has been resized.
>>>>>>> +    Won't that result in lookup failures?
>>>>>>> +    \end{lineref}
>>>>>>>  \QuickQuizAnswer{
>>>>>>> -    Suppose that a resize operation begins and distributes half of
>>>>>>> -    the old table's buckets to the new table.
>>>>>>> -    Suppose further that a thread adds a new element that goes into
>>>>>>> -    one of the already-distributed buckets, and that this same thread
>>>>>>> -    now looks up this newly added element.
>>>>>>> -    If lookups unconditionally traversed only the old hash table,
>>>>>>> -    this thread would get a lookup failure for the element that it
>>>>>>> -    just added, which certainly sounds like a bug to me!
>>>>>>> +    No, it won't.
>>>>>>> +    Resizing into the new hash table leaves the old hash table
>>>>>>> +    intact, courtesy of the pointer pairs.
>>>>>>>  } \QuickQuizEnd
>>>>>>>
>>>>>>>  \begin{lineref}[ln:datastruct:hash_resize:access:add]
>>>>>>>
>>>>
>>>> Hi Paul and Akira,
>>>>
>>>> Thanks a lot for the comments, which I need some more time to look
>>>> into. For Paul's patch, I have a few concerns. Please take a look.
>>>>
>>>> My understanding is that with this path, during the time period when
>>>> the resizing thread is running, an updater may insert/delete an item
>>>> into/from the new hash table, while readers are still looking up data
>>>> in the old one, resulting the readers are unaware of
>>>> insertions/deletions happening simultaneously. For example, it seems
>>>> the following sequence could happen.
>>>>
>>>> 1. The resizing thread starts.
>>>> 2. The resizing thread successfully passes bucket *B* of the old hash table.
>>>> 3. An updater wants to insert a new item *I* which should be inserted
>>>> into bucket *B*.
>>>> 4. The updater will select the new hash table and insert the item *I*
>>>> into the new hash table.
>>>> 5. A read request comes in and wants to lookup item *I*. The lookup
>>>> request will check the old hash table and fail. Doesn't it?
>>>> 6. The resizing thread exits.
>>>> 7. Now subsequent read requests can successfully find item *I*.
>>>
>>> Yes, this can happen.
>>>
>>>> Is my understanding correct? Please let me know if I misunderstood
>>>> anything. Give the truth that this patch can accelerate the fast path,
>>>> I think it should be OK because resizing is typically happen rarely.
>>>> Just want to make sure I fully understand the algorithm.
>>>
>>> It is a design choice, and some users would prefer not to fail to see
>>> new items during a resize.  One approach would be to revert back to
>>> the old-style checking, and another would be to provide a separate
>>> lookup interface that synchronizes with adds and deletes.
>>>
>>> So, I could add a quick quiz with this information, I could revert the
>>> change, or I could add another lookup function that provided more timely
>>> information.  Left to myself, I would provide a quick quiz, but what
>>> do you guys think?
>>
>> Hi, I was composing a message, but now I'm replying to this one.
>> I think adding a quick quiz would be a good idea.
> 
> But in the meantime, it occurred to me that I was looking at the
> problem in the wrong way.  I believe that the following patch makes
> hashtab_lookup() find elements recently added by hashtab_add(), even
> during a resize, and without the need for memory barriers.
> 
> The scenario that convinced me to take this approach is when a thread
> does hashtab_add(), then immediately searches for the newly added element.
> Failing to find it would be quite a surprise to most people.
When a thread does hashtab_del() and immediately checks the deletion,
it still finds the deleted element while resizing is in progress.
This would also be a surprise. Current version looks less consistent
than the simpler one did.

        Thanks, Akira

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit b61179bdc22e9750147ad3f540215af225aa3376
> Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
> Date:   Tue Jan 8 10:29:43 2019 -0800
> 
>     datastruct/hash: Make hashtab_lookup() more responsive during resize
>     
>     If hashtab_add() adds a new element to an already-resized bucket during
>     a resize, hashtab_lookup() won't find it until the resize is complete.
>     This is a not-unreasonable semantic, but might be quite surprising to most
>     users, especially if the thread doing the lookup is the same thread that
>     just added the new element.  This commit therefore causes hashtab_lookup()
>     to find recently added elements even during resize operations.
>     
>     Note that this change involved a small refactoring of the code,
>     introducing a new ht_search_bucket() helper function.  This refactoring
>     avoids the need for memory barriers as well because when an element is
>     not found and there is a resize in progress, the new version of the
>     hash table is unconditionally searched, thus avoiding the need for a
>     racy access to ->ht_resize_cur.  Although this approach can result in
>     needless searches of not-yet-filled-in buckets, such searches will
>     be rare (assuming resizing is rare), and a search of an empty bucket
>     is cheap anyway.
>     
>     Reported-by: Junchang Wang <junchangwang@xxxxxxxxx>
>     Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
> 
> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> index 9f68a00dabe3..6dbfe020d78d 100644
> --- a/CodeSamples/datastruct/hash/hash_resize.c
> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> @@ -124,7 +124,8 @@ void hashtab_free(struct hashtab *htp_master)
>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
>  static struct ht_bucket *				//\lnlbl{single:b}
> -ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
> +ht_get_bucket(struct ht *htp, void *key,
> +              long *b, unsigned long *h)
>  {
>  	unsigned long hash = htp->ht_gethash(key);
>  
> @@ -133,6 +134,24 @@ ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
>  		*h = hash;				//\lnlbl{single:h}
>  	return &htp->ht_bkt[*b];			//\lnlbl{single:return}
>  }							//\lnlbl{single:e}
> +
> +/* Search the bucket for the specfied key in the specified ht structure. */
> +static struct ht_elem *					//\lnlbl{hsb:b}
> +ht_search_bucket(struct ht *htp, void *key)
> +{
> +	long b;
> +	struct ht_elem *htep;
> +	struct ht_bucket *htbp;
> +
> +	htbp = ht_get_bucket(htp, key, &b, NULL);	//\lnlbl{hsb:get_curbkt}
> +	cds_list_for_each_entry_rcu(htep,		//\lnlbl{hsb:loop:b}
> +	                            &htbp->htb_head,
> +	                            hte_next[htp->ht_idx]) {
> +		if (htp->ht_cmp(htep, key)) 		//\lnlbl{hsb:match}
> +			return htep;			//\lnlbl{hsb:ret_match}
> +	}						//\lnlbl{hsb:loop:e}
> +	return NULL;					//\lnlbl{hsb:ret_NULL}
> +}							//\lnlbl{hsb:e}
>  //\end{snippet}
>  
>  /* Read-side lock/unlock functions. */
> @@ -203,20 +222,17 @@ void hashtab_lookup_done(struct ht_elem *htep)
>  struct ht_elem *					//\lnlbl{lkp:b}
>  hashtab_lookup(struct hashtab *htp_master, void *key)
>  {
> -	long b;
>  	struct ht *htp;
>  	struct ht_elem *htep;
> -	struct ht_bucket *htbp;
>  
>  	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
> -	htbp = ht_get_bucket(htp, key, &b, NULL);	//\lnlbl{lkp:get_curbkt}
> -	cds_list_for_each_entry_rcu(htep,		//\lnlbl{lkp:loop:b}
> -	                            &htbp->htb_head,
> -	                            hte_next[htp->ht_idx]) {
> -		if (htp->ht_cmp(htep, key)) 		//\lnlbl{lkp:match}
> -			return htep;			//\lnlbl{lkp:ret_match}
> -	}						//\lnlbl{lkp:loop:e}
> -	return NULL;					//\lnlbl{lkp:ret_NULL}
> +	htep = ht_search_bucket(htp, key);		//\lnlbl{lkp:get_curbkt}
> +	if (htep)					//\lnlbl{lkp:entchk}
> +		return htep;				//\lnlbl{lkp:ret_match}
> +	htp = rcu_dereference(htp->ht_new);		//\lnlbl{lkp:get_nxttbl}
> +	if (!htp)					//\lnlbl{lkp:htpchk}
> +		return NULL;				//\lnlbl{lkp:noresize}
> +	return ht_search_bucket(htp, key);		//\lnlbl{lkp:ret_nxtbkt}
>  }							//\lnlbl{lkp:e}
>  
>  /*
> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> index eb7f790aa5a3..175056cbe911 100644
> --- a/datastruct/datastruct.tex
> +++ b/datastruct/datastruct.tex
> @@ -966,8 +966,10 @@ the old table.
>  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
>  Bucket selection is shown in
>  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
> -which shows \co{ht_get_bucket()}.
> -This function returns a reference to the bucket
> +which shows \co{ht_get_bucket()} on
> +lines~\lnref{single:b}-\lnref{single:e} and \co{ht_search_bucket()} on
> +lines~\lnref{hsb:b}-\lnref{hsb:e}.
> +The \co{ht_get_bucket()} function returns a reference to the bucket
>  corresponding to the specified key in the specified hash table, without
>  making any allowances for resizing.
>  It also stores the bucket index corresponding to the key into the location
> @@ -976,6 +978,17 @@ line~\lnref{single:gethash}, and the corresponding
>  hash value corresponding to the key into the location
>  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
>  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
> +
> +The \co{ht_search_bucket()} function searches for the specified key
> +within the specified hash-table version.
> +Line~\lnref{hsb:get_curbkt} obtains a reference to the bucket corresponding
> +to the specified key.
> +The loop spanning lines~\lnref{hsb:loop:b}-\lnref{hsb:loop:e} searches
> +that bucket, so that if line~\lnref{hsb:match} detects a match,
> +line~\lnref{hsb:ret_match} returns a pointer to the enclosing data element.
> +Otherwise, if there is no match,
> +line~\lnref{hsb:ret_NULL} returns \co{NULL} to indicate
> +failure.
>  \end{lineref}
>  
>  \QuickQuiz{}
> @@ -1093,31 +1106,18 @@ The \co{hashtab_lookup()} function on
>  lines~\lnref{b}-\lnref{e} of the listing does
>  hash lookups.
>  Line~\lnref{get_curtbl} fetches the current hash table and
> -line~\lnref{get_curbkt} obtains a reference
> -to the bucket corresponding to the specified key.
> -The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
> -so that if line~\lnref{match}
> -detects a match,
> -line~\lnref{ret_match} returns a pointer to the enclosing data element.
> -Otherwise, if there is no match,
> -line~\lnref{ret_NULL} returns \co{NULL} to indicate
> -failure.
> +line~\lnref{get_curbkt} searches the bucket corresponding to the
> +specified key.
> +If line~\lnref{entchk} determines that the search was successful,
> +line~\lnref{ret_match} returns a pointer to the element that was located.
> +Otherwise, line~\lnref{get_nxttbl} picks up a pointer to the next version,
> +and if line~\lnref{htpchk} determines that there is no resize in progress,
> +line~\lnref{noresize} returns \co{NULL}.
> +When there is a resize in progress, execution reaches
> +line~\lnref{ret_nxtbkt}, which returns the result of searching the
> +bucket corresponding to \co{key} in the new version.
>  \end{lineref}
>  
> -\QuickQuiz{}
> -	\begin{lineref}[ln:datastruct:hash_resize:access:lkp]
> -	What if execution reaches line~\lnref{loop:b}
> -	of \co{hashtab_lookup()} in
> -	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
> -	just after this bucket has been resized.
> -	Won't that result in lookup failures?
> -	\end{lineref}
> -\QuickQuizAnswer{
> -	No, it won't.
> -	Resizing into the new hash table leaves the old hash table
> -	intact, courtesy of the pointer pairs.
> -} \QuickQuizEnd
> -
>  \begin{lineref}[ln:datastruct:hash_resize:access:add]
>  The \co{hashtab_add()} function on lines~\lnref{b}-\lnref{e} of the listing adds
>  new data elements to the hash table.
>