Re: Question regarding hash_resize

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 09, 2019 at 12:35:37AM +0900, Akira Yokosawa wrote:
> On 2019/01/09 0:28, Paul E. McKenney wrote:
> > On Tue, Jan 08, 2019 at 09:56:57AM +0800, Junchang Wang wrote:
> >> On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@xxxxxxxxx> wrote:
> >>> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
> >>>> Hi Paul,
> >>>>
> >>>> On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
> >>>>> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I'm reading hash_resize recently, and have a few questions regarding
> >>>>>> this algorithm. Please take a look if you have time. Any suggestions
> >>>>>> are warmly welcomed.
> >>>>>>
> >>>>>> === Question 1 ===
> >>>>>> In hash_resize.c : hashtab_lock_mod
> >>>>>> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
> >>>>>> 187                 lsp->hbp[1] = NULL;
> >>>>>> 188                 return;
> >>>>>> 189         }
> >>>>>> 190         htp = rcu_dereference(htp->ht_new);
> >>>>>>
> >>>>>> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
> >>>>>> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
> >>>>>> compilers and hardware from reordering the two unrelated variables,
> >>>>>> ht_resize_cur and ht_new. Is my understanding correct?
> >>>>>
> >>>>> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
> >>>>> section
> >>>>
> >>>> You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
> >>>> an RCU read-side critical section", don't you?
> >>>>
> >>>>>         and there is a synchronize_rcu() between the update to ->ht_new
> >>>>> and the updates to ->ht_resize_cur.  For more details on how this works,
> >>>>> please see https://lwn.net/Articles/573497/.
> >>>>>
> >>>>> Of course, if you find a code path in which a call to hashtab_lock_mod()
> >>>>> is invoked outside of an RCU read-side critical section, that would be
> >>>>> a bug.  (Can you tell me an exception to this rule, that is, a case
> >>>>> where hashtab_lock_mod() could safely be invoked outside of an RCU
> >>>>> read-side critical section?)
> >>>>>
> >>>>>> === Question 2 ===
> >>>>>> In hash_resize.c, each time an updater wants to access a bucket, the
> >>>>>> updater must first acquire the bucket's lock (htb_lock), preventing
> >>>>>> other updaters accessing the same bucket concurrently. This approach
> >>>>>> is OK if the linked list of a bucket is relatively short, but for a
> >>>>>> larger system where linked lists are long enough and the
> >>>>>> perftest_resize thread is running simultaneously, it could become a
> >>>>>> potential performance bottleneck. One naive solution is to allow
> >>>>>> multiple updaters to access the same bucket, only if they don't
> >>>>>> operate on the same item of the list of this bucket. I wonder if there
> >>>>>> are any existing works or discussions on this topic?
> >>>>>
> >>>>> One approach is to use a hashed array of locks, and to hash a given
> >>>>> element's address to locate the lock to be used.  Please see
> >>>>> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
> >>>>> Needed Locks First"), including Quick Quiz 7.9, for additional details.
> >>>>>
> >>>>> Another approach is to use RCU to protect traversals, and locks within the
> >>>>> linked-list elements themselves.  These locks are conditionally acquired
> >>>>> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
> >>>>> them in list order, and the tricks in Quick Quiz 7.9.
> >>>>>
> >>>>> Non-blocking synchronization can also be used, but it is often quite a
> >>>>> bit more complicated.  See for example the split-order list of Shalev
> >>>>> and Shavit, along with Desnoyers's RCU-protected extension in the
> >>>>> userspace RCU library.
> >>>>>
> >>>>> But it is usually -way- better to just choose a good hash function and
> >>>>> to increase the number of buckets.  Which is of course one reason for
> >>>>> having resizable hash tables.  ;-)
> >>>>>
> >>>>> But the other techniques can be useful in more complex linked data
> >>>>> structures, such as graphs, where there is no reasonable way to
> >>>>> partition the data.  Nevertheless, many people choose to do the
> >>>>> partitioning anyway, especially on distributed systems.
> >>>>>
> >>>>>> === Question 3 ===
> >>>>>> Chapter Data Structures also discusses other resizable hash tables,
> >>>>>> namely "Resizable, scalable, concurrent hash tables via relativistic
> >>>>>> programming" from Josh Triplett, which can save memory footprint by
> >>>>>> using a single pair of pointers. But my understanding is that
> >>>>>> perftest_resize.c is unique in that it allows you to rebuild the hash
> >>>>>> table by utilizing a different hash function, which could be very
> >>>>>> useful in practice (e.g., to prevent DDoS attack). Other solutions do
> >>>>>> not share this property. Is my understanding correct? Did I miss any
> >>>>>> discussions on this topic in perfbook?
> >>>>>
> >>>>> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
> >>>>> (which I use in hash_resize.c) is the only one allowing arbitrary changes
> >>>>> to hash functions.  I expect that this advantage will become increasingly
> >>>>> important as security issues become more challenging.  Furthermore, I
> >>>>> suspect that the pointer-pair approach is faster and more scalable.
> >>>>> It is certainly simpler.
> >>>>>
> >>>>> On the other hand, one advantage of the other two approaches is decreased
> >>>>> memory consumption.
> >>>>>
> >>>>> Another advantage of Josh Triplett's pointer-unzip approach is that
> >>>>> concurrent updates are (in theory, anyway) not blocked for as long
> >>>>> by resize operations.  The other edge of this sword is that resizing
> >>>>> is much slower, given the need to wait for many RCU grace periods.
> >>>>>
> >>>>> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
> >>>>> and Shavit's split-order list is that all operations are non-blocking,
> >>>>> which can be important on massively overloaded systems, such as one
> >>>>> might find in cloud computing.
> >>>>>
> >>>>>> === Question 4 ===
> >>>>>> In the current implementation of hash_resize.c, the perftest_resize
> >>>>>> could block an updater, and vice versa. It seems this is not what we
> >>>>>> expected. Ideally, they should be allowed to run concurrently, or at
> >>>>>> least the perftest_resize thread should have lower priority and
> >>>>>> updaters should never be blocked by the perftest_resize thread. Is
> >>>>>> that right? I'm very interested in helping improve. Please let me know
> >>>>>> if you have any suggestions.
> >>>>>
> >>>>> In hash_resize.c, an updater is blocked only for the time required to
> >>>>> redisposition a bucket.  This is a great improvement over blocking
> >>>>> updaters for the full resize over all buckets.
> >>>>>
> >>>>> But yes, it is not hard to do better, for example, periodically dropping
> >>>>> the old-table lock in hashtab_resize().  This requires a few careful
> >>>>> adjustments, of course.  Can you tell me what these adjustments are?
> >>>>>
> >>>>> Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
> >>>>> optimizing for the race with hashtab_resize() doesn't make a whole lot
> >>>>> of sense.  Please see the patch below.  Thoughts?
> >>>>>
> >>>>>                                                      Thanx, Paul
> >>>>>
> >>>>> ------------------------------------------------------------------------
> >>>>>
> >>>>> commit 737646a9c868d841b32199b52f5569668975953e
> >>>>> Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
> >>>>> Date:   Mon Jan 7 10:29:14 2019 -0800
> >>>>>
> >>>>>     datastruct/hash: Simplify hashtab_lookup()
> >>>>>
> >>>>>     Because resizing leaves the old hash table intact, and because lookups
> >>>>>     are carried out within RCU read-side critical sections (which prevent
> >>>>>     a second resizing operation from starting), there is no need for a
> >>>>>     lookup to search anywhere but in the old hash table.  And in the common
> >>>>>     case, there is no resize, so there is no new hash table.  Therefore,
> >>>>>     eliminating the check for resizing speeds things up in the common
> >>>>>     case.  In addition, this simplifies the code.
> >>>>>
> >>>>>     This commit therefore eliminates the ht_get_bucket() function,
> >>>>>     renames the ht_get_bucket_single() function to ht_get_bucket(),
> >>>>>     and modifies callers appropriately.
> >>>>>
> >>>>>     Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
> >>>>>
> >>>>> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> >>>>> index 29e05f907200..be4157959b83 100644
> >>>>> --- a/CodeSamples/datastruct/hash/hash_resize.c
> >>>>> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> >>>>> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
> >>>>>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
> >>>>>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
> >>>>>  static struct ht_bucket *                           //\lnlbl{single:b}
> >>>>> -ht_get_bucket_single(struct ht *htp, void *key, long *b,
> >>>>> -                     unsigned long *h)
> >>>>> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
> >>>>>  {
> >>>>>      unsigned long hash = htp->ht_gethash(key);
> >>>>>
> >>>>> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
> >>>>>              *h = hash;                              //\lnlbl{single:h}
> >>>>>      return &htp->ht_bkt[*b];                        //\lnlbl{single:return}
> >>>>>  }                                                   //\lnlbl{single:e}
> >>>>> -
> >>>>> -/* Get hash bucket correesponding to key, accounting for resize. */
> >>>>> -static struct ht_bucket *                           //\lnlbl{b}
> >>>>> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
> >>>>> -{
> >>>>> -    struct ht_bucket *htbp;
> >>>>> -
> >>>>> -    htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
> >>>>> -                                                            //\fcvexclude
> >>>>> -    if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {   //\lnlbl{resized}
> >>>>> -            smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
> >>>>
> >>>> If we can remove this memory barrier, the counterpart smp_mb() in
> >>>> hashtab_resize() becomes unnecessary, doesn't it?
> >>>
> >>> And the WRITE_ONCE() in the following line.
> >>>
> >>>         Thanks, Akira
> >>>>
> >>>>         Thanks, Akira
> >>>>
> >>>>> -            *htp = rcu_dereference((*htp)->ht_new); //\lnlbl{newtable}
> >>>>> -            htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
> >>>>> -    }
> >>>>> -    if (i)                                          //\lnlbl{chk_i}
> >>>>> -            *i = (*htp)->ht_idx;                    //\lnlbl{set_idx}
> >>>>> -    return htbp;                                    //\lnlbl{return}
> >>>>> -}                                                   //\lnlbl{e}
> >>>>>  //\end{snippet}
> >>>>>
> >>>>>  /* Read-side lock/unlock functions. */
> >>>>> @@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
> >>>>>
> >>>>>      rcu_read_lock();                                //\lnlbl{l:rcu_lock}
> >>>>>      htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{l:refhashtbl}
> >>>>> -    htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:refbucket}
> >>>>> +    htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:refbucket}
> >>>>>      spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_bucket}
> >>>>>      lsp->hbp[0] = htbp;                             //\lnlbl{l:lsp0b}
> >>>>>      lsp->hls_idx[0] = htp->ht_idx;
> >>>>> @@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
> >>>>>              return;                                 //\lnlbl{l:fastret1}
> >>>>>      }
> >>>>>      htp = rcu_dereference(htp->ht_new);             //\lnlbl{l:new_hashtbl}
> >>>>> -    htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:get_newbkt}
> >>>>> +    htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:get_newbkt}
> >>>>>      spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_newbkt}
> >>>>>      lsp->hbp[1] = htbp;                             //\lnlbl{l:lsp1b}
> >>>>>      lsp->hls_idx[1] = htp->ht_idx;
> >>>>> @@ -223,16 +204,15 @@ struct ht_elem *                                       //\lnlbl{lkp:b}
> >>>>>  hashtab_lookup(struct hashtab *htp_master, void *key)
> >>>>>  {
> >>>>>      long b;
> >>>>> -    int i;
> >>>>>      struct ht *htp;
> >>>>>      struct ht_elem *htep;
> >>>>>      struct ht_bucket *htbp;
> >>>>>
> >>>>>      htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{lkp:get_curtbl}
> >>>>> -    htbp = ht_get_bucket(&htp, key, &b, &i);        //\lnlbl{lkp:get_curbkt}
> >>>>> +    htbp = ht_get_bucket(htp, key, &b, NULL);       //\lnlbl{lkp:get_curbkt}
> >>>>>      cds_list_for_each_entry_rcu(htep,               //\lnlbl{lkp:loop:b}
> >>>>>                                  &htbp->htb_head,
> >>>>> -                                hte_next[i]) {
> >>>>> +                                hte_next[htp->ht_idx]) {
> >>>>>              if (htp->ht_cmp(htep, key))             //\lnlbl{lkp:match}
> >>>>>                      return htep;                    //\lnlbl{lkp:ret_match}
> >>>>>      }                                               //\lnlbl{lkp:loop:e}
> >>>>> @@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
> >>>>>              htbp = &htp->ht_bkt[i];                 //\lnlbl{get_oldcur}
> >>>>>              spin_lock(&htbp->htb_lock);             //\lnlbl{acq_oldcur}
> >>>>>              cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
> >>>>> -                    htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> >>>>> +                    htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> >>>>>                      spin_lock(&htbp_new->htb_lock);
> >>>>>                      cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
> >>>>>                      spin_unlock(&htbp_new->htb_lock);
> >>>>> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> >>>>> index 5c61bf5e2389..0152437c274e 100644
> >>>>> --- a/datastruct/datastruct.tex
> >>>>> +++ b/datastruct/datastruct.tex
> >>>>> @@ -966,10 +966,8 @@ the old table.
> >>>>>  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> >>>>>  Bucket selection is shown in
> >>>>>  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
> >>>>> -which shows \co{ht_get_bucket_single()} on
> >>>>> -lines~\lnref{single:b}-\lnref{single:e} and
> >>>>> -\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
> >>>>> -The \co{ht_get_bucket_single()} function returns a reference to the bucket
> >>>>> +which shows \co{ht_get_bucket()}.
> >>>>> +This function returns a reference to the bucket
> >>>>>  corresponding to the specified key in the specified hash table, without
> >>>>>  making any allowances for resizing.
> >>>>>  It also stores the bucket index corresponding to the key into the location
> >>>>> @@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
> >>>>>  hash value corresponding to the key into the location
> >>>>>  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
> >>>>>  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
> >>>>> -
> >>>>> -The \co{ht_get_bucket()} function handles hash-table selection, invoking
> >>>>> -\co{ht_get_bucket_single()} on
> >>>>> -line~\lnref{call_single} to select the bucket
> >>>>> -corresponding to the hash in the current
> >>>>> -hash table, storing the hash value through parameter~\co{b}.
> >>>>> -If line~\lnref{resized} determines that the table is being resized and that
> >>>>> -line~\lnref{call_single}'s bucket has already been distributed across the new hash
> >>>>> -table, then line~\lnref{newtable} selects the new hash table and
> >>>>> -line~\lnref{newbucket}
> >>>>> -selects the bucket corresponding to the hash in the new hash table,
> >>>>> -again storing the hash value through parameter~\co{b}.
> >>>>> -\end{lineref}
> >>>>> -
> >>>>> -\QuickQuiz{}
> >>>>> -    The code in
> >>>>> -    Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
> >>>>> -    computes the hash twice!
> >>>>> -    Why this blatant inefficiency?
> >>>>> -\QuickQuizAnswer{
> >>>>> -    The reason is that the old and new hash tables might have
> >>>>> -    completely different hash functions, so that a hash computed
> >>>>> -    for the old table might be completely irrelevant to the
> >>>>> -    new table.
> >>>>> -} \QuickQuizEnd
> >>>>> -
> >>>>> -\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> >>>>> -If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
> >>>>> -line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
> >>>>> -Finally, line~\lnref{return} returns a reference to the selected hash bucket.
> >>>>>  \end{lineref}
> >>>>>
> >>>>>  \QuickQuiz{}
> >>>>> @@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
> >>>>>      functions described next.
> >>>>>  } \QuickQuizEnd
> >>>>>
> >>>>> -This implementation of
> >>>>> -\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
> >>>>> -permit lookups and modifications to run concurrently
> >>>>> -with a resize operation.
> >>>>> +This implementation of \co{ht_get_bucket()} permits lookups and
> >>>>> +modifications to run concurrently with a resize operation.
> >>>>>
> >>>>>  \begin{listing}[tb]
> >>>>>  \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
> >>>>> @@ -1129,11 +1095,6 @@ hash lookups.
> >>>>>  Line~\lnref{get_curtbl} fetches the current hash table and
> >>>>>  line~\lnref{get_curbkt} obtains a reference
> >>>>>  to the bucket corresponding to the specified key.
> >>>>> -This bucket will be located in a new resized hash table when a
> >>>>> -resize operation has progressed past the bucket in the old hash
> >>>>> -table that contained the desired data element.
> >>>>> -Note that line~\lnref{get_curbkt} also passes back the index that will be
> >>>>> -used to select the correct set of pointers from the pair in each element.
> >>>>>  The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
> >>>>>  so that if line~\lnref{match}
> >>>>>  detects a match,
> >>>>> @@ -1144,22 +1105,17 @@ failure.
> >>>>>  \end{lineref}
> >>>>>
> >>>>>  \QuickQuiz{}
> >>>>> -    In the \co{hashtab_lookup()} function in
> >>>>> -    Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
> >>>>> -    the code carefully finds the right bucket in the new hash table
> >>>>> -    if the element to be looked up has already been distributed
> >>>>> -    by a concurrent resize operation.
> >>>>> -    This seems wasteful for RCU-protected lookups.
> >>>>> -    Why not just stick with the old hash table in this case?
> >>>>> +    \begin{lineref}[ln:datastruct:hash_resize:access:lkp]
> >>>>> +    What if execution reaches line~\lnref{loop:b}
> >>>>> +    of \co{hashtab_lookup()} in
> >>>>> +    Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
> >>>>> +    just after this bucket has been resized.
> >>>>> +    Won't that result in lookup failures?
> >>>>> +    \end{lineref}
> >>>>>  \QuickQuizAnswer{
> >>>>> -    Suppose that a resize operation begins and distributes half of
> >>>>> -    the old table's buckets to the new table.
> >>>>> -    Suppose further that a thread adds a new element that goes into
> >>>>> -    one of the already-distributed buckets, and that this same thread
> >>>>> -    now looks up this newly added element.
> >>>>> -    If lookups unconditionally traversed only the old hash table,
> >>>>> -    this thread would get a lookup failure for the element that it
> >>>>> -    just added, which certainly sounds like a bug to me!
> >>>>> +    No, it won't.
> >>>>> +    Resizing into the new hash table leaves the old hash table
> >>>>> +    intact, courtesy of the pointer pairs.
> >>>>>  } \QuickQuizEnd
> >>>>>
> >>>>>  \begin{lineref}[ln:datastruct:hash_resize:access:add]
> >>>>>
> >>
> >> Hi Paul and Akira,
> >>
> >> Thanks a lot for the comments, which I need some more time to look
> >> into. For Paul's patch, I have a few concerns. Please take a look.
> >>
> >> My understanding is that with this path, during the time period when
> >> the resizing thread is running, an updater may insert/delete an item
> >> into/from the new hash table, while readers are still looking up data
> >> in the old one, resulting the readers are unaware of
> >> insertions/deletions happening simultaneously. For example, it seems
> >> the following sequence could happen.
> >>
> >> 1. The resizing thread starts.
> >> 2. The resizing thread successfully passes bucket *B* of the old hash table.
> >> 3. An updater wants to insert a new item *I* which should be inserted
> >> into bucket *B*.
> >> 4. The updater will select the new hash table and insert the item *I*
> >> into the new hash table.
> >> 5. A read request comes in and wants to lookup item *I*. The lookup
> >> request will check the old hash table and fail. Doesn't it?
> >> 6. The resizing thread exits.
> >> 7. Now subsequent read requests can successfully find item *I*.
> > 
> > Yes, this can happen.
> > 
> >> Is my understanding correct? Please let me know if I misunderstood
> >> anything. Give the truth that this patch can accelerate the fast path,
> >> I think it should be OK because resizing is typically happen rarely.
> >> Just want to make sure I fully understand the algorithm.
> > 
> > It is a design choice, and some users would prefer not to fail to see
> > new items during a resize.  One approach would be to revert back to
> > the old-style checking, and another would be to provide a separate
> > lookup interface that synchronizes with adds and deletes.
> > 
> > So, I could add a quick quiz with this information, I could revert the
> > change, or I could add another lookup function that provided more timely
> > information.  Left to myself, I would provide a quick quiz, but what
> > do you guys think?
> 
> Hi, I was composing a message, but now I'm replying to this one.
> I think adding a quick quiz would be a good idea.

But in the meantime, it occurred to me that I was looking at the
problem in the wrong way.  I believe that the following patch makes
hashtab_lookup() find elements recently added by hashtab_add(), even
during a resize, and without the need for memory barriers.

The scenario that convinced me to take this approach is when a thread
does hashtab_add(), then immediately searches for the newly added element.
Failing to find it would be quite a surprise to most people.

							Thanx, Paul

------------------------------------------------------------------------

commit b61179bdc22e9750147ad3f540215af225aa3376
Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
Date:   Tue Jan 8 10:29:43 2019 -0800

    datastruct/hash: Make hashtab_lookup() more responsive during resize
    
    If hashtab_add() adds a new element to an already-resized bucket during
    a resize, hashtab_lookup() won't find it until the resize is complete.
    This is a not-unreasonable semantic, but might be quite surprising to most
    users, especially if the thread doing the lookup is the same thread that
    just added the new element.  This commit therefore causes hashtab_lookup()
    to find recently added elements even during resize operations.
    
    Note that this change involved a small refactoring of the code,
    introducing a new ht_search_bucket() helper function.  This refactoring
    avoids the need for memory barriers as well because when an element is
    not found and there is a resize in progress, the new version of the
    hash table is unconditionally searched, thus avoiding the need for a
    racy access to ->ht_resize_cur.  Although this approach can result in
    needless searches of not-yet-filled-in buckets, such searches will
    be rare (assuming resizing is rare), and a search of an empty bucket
    is cheap anyway.
    
    Reported-by: Junchang Wang <junchangwang@xxxxxxxxx>
    Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>

diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
index 9f68a00dabe3..6dbfe020d78d 100644
--- a/CodeSamples/datastruct/hash/hash_resize.c
+++ b/CodeSamples/datastruct/hash/hash_resize.c
@@ -124,7 +124,8 @@ void hashtab_free(struct hashtab *htp_master)
 //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
 /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
 static struct ht_bucket *				//\lnlbl{single:b}
-ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
+ht_get_bucket(struct ht *htp, void *key,
+              long *b, unsigned long *h)
 {
 	unsigned long hash = htp->ht_gethash(key);
 
@@ -133,6 +134,24 @@ ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
 		*h = hash;				//\lnlbl{single:h}
 	return &htp->ht_bkt[*b];			//\lnlbl{single:return}
 }							//\lnlbl{single:e}
+
+/* Search the bucket for the specfied key in the specified ht structure. */
+static struct ht_elem *					//\lnlbl{hsb:b}
+ht_search_bucket(struct ht *htp, void *key)
+{
+	long b;
+	struct ht_elem *htep;
+	struct ht_bucket *htbp;
+
+	htbp = ht_get_bucket(htp, key, &b, NULL);	//\lnlbl{hsb:get_curbkt}
+	cds_list_for_each_entry_rcu(htep,		//\lnlbl{hsb:loop:b}
+	                            &htbp->htb_head,
+	                            hte_next[htp->ht_idx]) {
+		if (htp->ht_cmp(htep, key)) 		//\lnlbl{hsb:match}
+			return htep;			//\lnlbl{hsb:ret_match}
+	}						//\lnlbl{hsb:loop:e}
+	return NULL;					//\lnlbl{hsb:ret_NULL}
+}							//\lnlbl{hsb:e}
 //\end{snippet}
 
 /* Read-side lock/unlock functions. */
@@ -203,20 +222,17 @@ void hashtab_lookup_done(struct ht_elem *htep)
 struct ht_elem *					//\lnlbl{lkp:b}
 hashtab_lookup(struct hashtab *htp_master, void *key)
 {
-	long b;
 	struct ht *htp;
 	struct ht_elem *htep;
-	struct ht_bucket *htbp;
 
 	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
-	htbp = ht_get_bucket(htp, key, &b, NULL);	//\lnlbl{lkp:get_curbkt}
-	cds_list_for_each_entry_rcu(htep,		//\lnlbl{lkp:loop:b}
-	                            &htbp->htb_head,
-	                            hte_next[htp->ht_idx]) {
-		if (htp->ht_cmp(htep, key)) 		//\lnlbl{lkp:match}
-			return htep;			//\lnlbl{lkp:ret_match}
-	}						//\lnlbl{lkp:loop:e}
-	return NULL;					//\lnlbl{lkp:ret_NULL}
+	htep = ht_search_bucket(htp, key);		//\lnlbl{lkp:get_curbkt}
+	if (htep)					//\lnlbl{lkp:entchk}
+		return htep;				//\lnlbl{lkp:ret_match}
+	htp = rcu_dereference(htp->ht_new);		//\lnlbl{lkp:get_nxttbl}
+	if (!htp)					//\lnlbl{lkp:htpchk}
+		return NULL;				//\lnlbl{lkp:noresize}
+	return ht_search_bucket(htp, key);		//\lnlbl{lkp:ret_nxtbkt}
 }							//\lnlbl{lkp:e}
 
 /*
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index eb7f790aa5a3..175056cbe911 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -966,8 +966,10 @@ the old table.
 \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
 Bucket selection is shown in
 Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
-which shows \co{ht_get_bucket()}.
-This function returns a reference to the bucket
+which shows \co{ht_get_bucket()} on
+lines~\lnref{single:b}-\lnref{single:e} and \co{ht_search_bucket()} on
+lines~\lnref{hsb:b}-\lnref{hsb:e}.
+The \co{ht_get_bucket()} function returns a reference to the bucket
 corresponding to the specified key in the specified hash table, without
 making any allowances for resizing.
 It also stores the bucket index corresponding to the key into the location
@@ -976,6 +978,17 @@ line~\lnref{single:gethash}, and the corresponding
 hash value corresponding to the key into the location
 referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
 Line~\lnref{single:return} then returns a reference to the corresponding bucket.
+
+The \co{ht_search_bucket()} function searches for the specified key
+within the specified hash-table version.
+Line~\lnref{hsb:get_curbkt} obtains a reference to the bucket corresponding
+to the specified key.
+The loop spanning lines~\lnref{hsb:loop:b}-\lnref{hsb:loop:e} searches
+that bucket, so that if line~\lnref{hsb:match} detects a match,
+line~\lnref{hsb:ret_match} returns a pointer to the enclosing data element.
+Otherwise, if there is no match,
+line~\lnref{hsb:ret_NULL} returns \co{NULL} to indicate
+failure.
 \end{lineref}
 
 \QuickQuiz{}
@@ -1093,31 +1106,18 @@ The \co{hashtab_lookup()} function on
 lines~\lnref{b}-\lnref{e} of the listing does
 hash lookups.
 Line~\lnref{get_curtbl} fetches the current hash table and
-line~\lnref{get_curbkt} obtains a reference
-to the bucket corresponding to the specified key.
-The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
-so that if line~\lnref{match}
-detects a match,
-line~\lnref{ret_match} returns a pointer to the enclosing data element.
-Otherwise, if there is no match,
-line~\lnref{ret_NULL} returns \co{NULL} to indicate
-failure.
+line~\lnref{get_curbkt} searches the bucket corresponding to the
+specified key.
+If line~\lnref{entchk} determines that the search was successful,
+line~\lnref{ret_match} returns a pointer to the element that was located.
+Otherwise, line~\lnref{get_nxttbl} picks up a pointer to the next version,
+and if line~\lnref{htpchk} determines that there is no resize in progress,
+line~\lnref{noresize} returns \co{NULL}.
+When there is a resize in progress, execution reaches
+line~\lnref{ret_nxtbkt}, which returns the result of searching the
+bucket corresponding to \co{key} in the new version.
 \end{lineref}
 
-\QuickQuiz{}
-	\begin{lineref}[ln:datastruct:hash_resize:access:lkp]
-	What if execution reaches line~\lnref{loop:b}
-	of \co{hashtab_lookup()} in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
-	just after this bucket has been resized.
-	Won't that result in lookup failures?
-	\end{lineref}
-\QuickQuizAnswer{
-	No, it won't.
-	Resizing into the new hash table leaves the old hash table
-	intact, courtesy of the pointer pairs.
-} \QuickQuizEnd
-
 \begin{lineref}[ln:datastruct:hash_resize:access:add]
 The \co{hashtab_add()} function on lines~\lnref{b}-\lnref{e} of the listing adds
 new data elements to the hash table.




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux