Re: Question regarding hash_resize

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxx> · Mon, 7 Jan 2019 10:33:17 -0800

On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
> Hi all,
> 
> I'm reading hash_resize recently, and have a few questions regarding
> this algorithm. Please take a look if you have time. Any suggestions
> are warmly welcomed.
> 
> === Question 1 ===
> In hash_resize.c : hashtab_lock_mod
> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
> 187                 lsp->hbp[1] = NULL;
> 188                 return;
> 189         }
> 190         htp = rcu_dereference(htp->ht_new);
> 
> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
> compilers and hardware from reordering the two unrelated variables,
> ht_resize_cur and ht_new. Is my understanding correct?

Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
section and there is a synchronize_rcu() between the update to ->ht_new
and the updates to ->ht_resize_cur.  For more details on how this works,
please see https://lwn.net/Articles/573497/.

Of course, if you find a code path in which a call to hashtab_lock_mod()
is invoked outside of an RCU read-side critical section, that would be
a bug.  (Can you tell me an exception to this rule, that is, a case
where hashtab_lock_mod() could safely be invoked outside of an RCU
read-side critical section?)

> === Question 2 ===
> In hash_resize.c, each time an updater wants to access a bucket, the
> updater must first acquire the bucket's lock (htb_lock), preventing
> other updaters accessing the same bucket concurrently. This approach
> is OK if the linked list of a bucket is relatively short, but for a
> larger system where linked lists are long enough and the
> perftest_resize thread is running simultaneously, it could become a
> potential performance bottleneck. One naive solution is to allow
> multiple updaters to access the same bucket, only if they don't
> operate on the same item of the list of this bucket. I wonder if there
> are any existing works or discussions on this topic?

One approach is to use a hashed array of locks, and to hash a given
element's address to locate the lock to be used.  Please see
Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
Needed Locks First"), including Quick Quiz 7.9, for additional details.

Another approach is to use RCU to protect traversals, and locks within the
linked-list elements themselves.  These locks are conditionally acquired
(again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
them in list order, and the tricks in Quick Quiz 7.9.

Non-blocking synchronization can also be used, but it is often quite a
bit more complicated.  See for example the split-order list of Shalev
and Shavit, along with Desnoyers's RCU-protected extension in the
userspace RCU library.

But it is usually -way- better to just choose a good hash function and
to increase the number of buckets.  Which is of course one reason for
having resizable hash tables.  ;-)

But the other techniques can be useful in more complex linked data
structures, such as graphs, where there is no reasonable way to
partition the data.  Nevertheless, many people choose to do the
partitioning anyway, especially on distributed systems.

> === Question 3 ===
> Chapter Data Structures also discusses other resizable hash tables,
> namely "Resizable, scalable, concurrent hash tables via relativistic
> programming" from Josh Triplett, which can save memory footprint by
> using a single pair of pointers. But my understanding is that
> perftest_resize.c is unique in that it allows you to rebuild the hash
> table by utilizing a different hash function, which could be very
> useful in practice (e.g., to prevent DDoS attack). Other solutions do
> not share this property. Is my understanding correct? Did I miss any
> discussions on this topic in perfbook?

Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
(which I use in hash_resize.c) is the only one allowing arbitrary changes
to hash functions.  I expect that this advantage will become increasingly
important as security issues become more challenging.  Furthermore, I
suspect that the pointer-pair approach is faster and more scalable.
It is certainly simpler.

On the other hand, one advantage of the other two approaches is decreased
memory consumption.

Another advantage of Josh Triplett's pointer-unzip approach is that
concurrent updates are (in theory, anyway) not blocked for as long
by resize operations.  The other edge of this sword is that resizing
is much slower, given the need to wait for many RCU grace periods.

Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
and Shavit's split-order list is that all operations are non-blocking,
which can be important on massively overloaded systems, such as one
might find in cloud computing.

> === Question 4 ===
> In the current implementation of hash_resize.c, the perftest_resize
> could block an updater, and vice versa. It seems this is not what we
> expected. Ideally, they should be allowed to run concurrently, or at
> least the perftest_resize thread should have lower priority and
> updaters should never be blocked by the perftest_resize thread. Is
> that right? I'm very interested in helping improve. Please let me know
> if you have any suggestions.

In hash_resize.c, an updater is blocked only for the time required to
redisposition a bucket.  This is a great improvement over blocking
updaters for the full resize over all buckets.

But yes, it is not hard to do better, for example, periodically dropping
the old-table lock in hashtab_resize().  This requires a few careful
adjustments, of course.  Can you tell me what these adjustments are?

Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
optimizing for the race with hashtab_resize() doesn't make a whole lot
of sense.  Please see the patch below.  Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit 737646a9c868d841b32199b52f5569668975953e
Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
Date:   Mon Jan 7 10:29:14 2019 -0800

    datastruct/hash: Simplify hashtab_lookup()
    
    Because resizing leaves the old hash table intact, and because lookups
    are carried out within RCU read-side critical sections (which prevent
    a second resizing operation from starting), there is no need for a
    lookup to search anywhere but in the old hash table.  And in the common
    case, there is no resize, so there is no new hash table.  Therefore,
    eliminating the check for resizing speeds things up in the common
    case.  In addition, this simplifies the code.
    
    This commit therefore eliminates the ht_get_bucket() function,
    renames the ht_get_bucket_single() function to ht_get_bucket(),
    and modifies callers appropriately.
    
    Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>

diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
index 29e05f907200..be4157959b83 100644
--- a/CodeSamples/datastruct/hash/hash_resize.c
+++ b/CodeSamples/datastruct/hash/hash_resize.c
@@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
 //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
 /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
 static struct ht_bucket *				//\lnlbl{single:b}
-ht_get_bucket_single(struct ht *htp, void *key, long *b,
-                     unsigned long *h)
+ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
 {
 	unsigned long hash = htp->ht_gethash(key);
 
@@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
 		*h = hash;				//\lnlbl{single:h}
 	return &htp->ht_bkt[*b];			//\lnlbl{single:return}
 }							//\lnlbl{single:e}
-
-/* Get hash bucket correesponding to key, accounting for resize. */
-static struct ht_bucket *				//\lnlbl{b}
-ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
-{
-	struct ht_bucket *htbp;
-
-	htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
-								//\fcvexclude
-	if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {	//\lnlbl{resized}
-		smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
-		*htp = rcu_dereference((*htp)->ht_new);	//\lnlbl{newtable}
-		htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
-	}
-	if (i)						//\lnlbl{chk_i}
-		*i = (*htp)->ht_idx;			//\lnlbl{set_idx}
-	return htbp;					//\lnlbl{return}
-}							//\lnlbl{e}
 //\end{snippet}
 
 /* Read-side lock/unlock functions. */
@@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
 
 	rcu_read_lock();				//\lnlbl{l:rcu_lock}
 	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{l:refhashtbl}
-	htbp = ht_get_bucket_single(htp, key, &b, &h);	//\lnlbl{l:refbucket}
+	htbp = ht_get_bucket(htp, key, &b, &h);		//\lnlbl{l:refbucket}
 	spin_lock(&htbp->htb_lock);			//\lnlbl{l:acq_bucket}
 	lsp->hbp[0] = htbp;				//\lnlbl{l:lsp0b}
 	lsp->hls_idx[0] = htp->ht_idx;
@@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
 		return;					//\lnlbl{l:fastret1}
 	}
 	htp = rcu_dereference(htp->ht_new);		//\lnlbl{l:new_hashtbl}
-	htbp = ht_get_bucket_single(htp, key, &b, &h);	//\lnlbl{l:get_newbkt}
+	htbp = ht_get_bucket(htp, key, &b, &h);		//\lnlbl{l:get_newbkt}
 	spin_lock(&htbp->htb_lock);			//\lnlbl{l:acq_newbkt}
 	lsp->hbp[1] = htbp;				//\lnlbl{l:lsp1b}
 	lsp->hls_idx[1] = htp->ht_idx;
@@ -223,16 +204,15 @@ struct ht_elem *					//\lnlbl{lkp:b}
 hashtab_lookup(struct hashtab *htp_master, void *key)
 {
 	long b;
-	int i;
 	struct ht *htp;
 	struct ht_elem *htep;
 	struct ht_bucket *htbp;
 
 	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
-	htbp = ht_get_bucket(&htp, key, &b, &i);	//\lnlbl{lkp:get_curbkt}
+	htbp = ht_get_bucket(htp, key, &b, NULL);	//\lnlbl{lkp:get_curbkt}
 	cds_list_for_each_entry_rcu(htep,		//\lnlbl{lkp:loop:b}
 	                            &htbp->htb_head,
-	                            hte_next[i]) {
+	                            hte_next[htp->ht_idx]) {
 		if (htp->ht_cmp(htep, key)) 		//\lnlbl{lkp:match}
 			return htep;			//\lnlbl{lkp:ret_match}
 	}						//\lnlbl{lkp:loop:e}
@@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
 		htbp = &htp->ht_bkt[i];			//\lnlbl{get_oldcur}
 		spin_lock(&htbp->htb_lock);		//\lnlbl{acq_oldcur}
 		cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
-			htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
+			htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
 			spin_lock(&htbp_new->htb_lock);
 			cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
 			spin_unlock(&htbp_new->htb_lock);
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 5c61bf5e2389..0152437c274e 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -966,10 +966,8 @@ the old table.
 \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
 Bucket selection is shown in
 Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
-which shows \co{ht_get_bucket_single()} on
-lines~\lnref{single:b}-\lnref{single:e} and
-\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
-The \co{ht_get_bucket_single()} function returns a reference to the bucket
+which shows \co{ht_get_bucket()}.
+This function returns a reference to the bucket
 corresponding to the specified key in the specified hash table, without
 making any allowances for resizing.
 It also stores the bucket index corresponding to the key into the location
@@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
 hash value corresponding to the key into the location
 referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
 Line~\lnref{single:return} then returns a reference to the corresponding bucket.
-
-The \co{ht_get_bucket()} function handles hash-table selection, invoking
-\co{ht_get_bucket_single()} on
-line~\lnref{call_single} to select the bucket
-corresponding to the hash in the current
-hash table, storing the hash value through parameter~\co{b}.
-If line~\lnref{resized} determines that the table is being resized and that
-line~\lnref{call_single}'s bucket has already been distributed across the new hash
-table, then line~\lnref{newtable} selects the new hash table and
-line~\lnref{newbucket}
-selects the bucket corresponding to the hash in the new hash table,
-again storing the hash value through parameter~\co{b}.
-\end{lineref}
-
-\QuickQuiz{}
-	The code in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
-	computes the hash twice!
-	Why this blatant inefficiency?
-\QuickQuizAnswer{
-	The reason is that the old and new hash tables might have
-	completely different hash functions, so that a hash computed
-	for the old table might be completely irrelevant to the
-	new table.
-} \QuickQuizEnd
-
-\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
-If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
-line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
-Finally, line~\lnref{return} returns a reference to the selected hash bucket.
 \end{lineref}
 
 \QuickQuiz{}
@@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
 	functions described next.
 } \QuickQuizEnd
 
-This implementation of
-\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
-permit lookups and modifications to run concurrently
-with a resize operation.
+This implementation of \co{ht_get_bucket()} permits lookups and
+modifications to run concurrently with a resize operation.
 
 \begin{listing}[tb]
 \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
@@ -1129,11 +1095,6 @@ hash lookups.
 Line~\lnref{get_curtbl} fetches the current hash table and
 line~\lnref{get_curbkt} obtains a reference
 to the bucket corresponding to the specified key.
-This bucket will be located in a new resized hash table when a
-resize operation has progressed past the bucket in the old hash
-table that contained the desired data element.
-Note that line~\lnref{get_curbkt} also passes back the index that will be
-used to select the correct set of pointers from the pair in each element.
 The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
 so that if line~\lnref{match}
 detects a match,
@@ -1144,22 +1105,17 @@ failure.
 \end{lineref}
 
 \QuickQuiz{}
-	In the \co{hashtab_lookup()} function in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
-	the code carefully finds the right bucket in the new hash table
-	if the element to be looked up has already been distributed
-	by a concurrent resize operation.
-	This seems wasteful for RCU-protected lookups.
-	Why not just stick with the old hash table in this case?
+	\begin{lineref}[ln:datastruct:hash_resize:access:lkp]
+	What if execution reaches line~\lnref{loop:b}
+	of \co{hashtab_lookup()} in
+	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
+	just after this bucket has been resized.
+	Won't that result in lookup failures?
+	\end{lineref}
 \QuickQuizAnswer{
-	Suppose that a resize operation begins and distributes half of
-	the old table's buckets to the new table.
-	Suppose further that a thread adds a new element that goes into
-	one of the already-distributed buckets, and that this same thread
-	now looks up this newly added element.
-	If lookups unconditionally traversed only the old hash table,
-	this thread would get a lookup failure for the element that it
-	just added, which certainly sounds like a bug to me!
+	No, it won't.
+	Resizing into the new hash table leaves the old hash table
+	intact, courtesy of the pointer pairs.
 } \QuickQuizEnd
 
 \begin{lineref}[ln:datastruct:hash_resize:access:add]