On Fri, Jan 18, 2019 at 10:32:51PM +0800, Junchang Wang wrote: > Hi Paul, > > Sorry for the late reply. Please see below, > > On Tue, Jan 8, 2019 at 2:32 AM Paul E. McKenney <paulmck@xxxxxxxxxxxxx> wrote: > > > > On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote: > > > Hi all, > > > > > > I'm reading hash_resize recently, and have a few questions regarding > > > this algorithm. Please take a look if you have time. Any suggestions > > > are warmly welcomed. > > > > > > === Question 1 === > > > In hash_resize.c : hashtab_lock_mod > > > 186 if (b > READ_ONCE(htp->ht_resize_cur)) { > > > 187 lsp->hbp[1] = NULL; > > > 188 return; > > > 189 } > > > 190 htp = rcu_dereference(htp->ht_new); > > > > > > It seems we are missing a barrier (e.g., smp_mb) in between lines 189 > > > and 190, because neither READ_ONCE() nor rcu_dereference() can prevent > > > compilers and hardware from reordering the two unrelated variables, > > > ht_resize_cur and ht_new. Is my understanding correct? > > > > Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical > > section and there is a synchronize_rcu() between the update to ->ht_new > > and the updates to ->ht_resize_cur. For more details on how this works, > > please see https://lwn.net/Articles/573497/. > > Thanks for the pointer to the article, which is very clear and helpful. Glad it helped! > > Of course, if you find a code path in which a call to hashtab_lock_mod() > > is invoked outside of an RCU read-side critical section, that would be > > a bug. (Can you tell me an exception to this rule, that is, a case > > where hashtab_lock_mod() could safely be invoked outside of an RCU > > read-side critical section?) > > > > > === Question 2 === > > > In hash_resize.c, each time an updater wants to access a bucket, the > > > updater must first acquire the bucket's lock (htb_lock), preventing > > > other updaters accessing the same bucket concurrently. This approach > > > is OK if the linked list of a bucket is relatively short, but for a > > > larger system where linked lists are long enough and the > > > perftest_resize thread is running simultaneously, it could become a > > > potential performance bottleneck. One naive solution is to allow > > > multiple updaters to access the same bucket, only if they don't > > > operate on the same item of the list of this bucket. I wonder if there > > > are any existing works or discussions on this topic? > > > > One approach is to use a hashed array of locks, and to hash a given > > element's address to locate the lock to be used. Please see > > Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire > > Needed Locks First"), including Quick Quiz 7.9, for additional details. > > > > Another approach is to use RCU to protect traversals, and locks within the > > linked-list elements themselves. These locks are conditionally acquired > > (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring > > them in list order, and the tricks in Quick Quiz 7.9. > > > > Non-blocking synchronization can also be used, but it is often quite a > > bit more complicated. See for example the split-order list of Shalev > > and Shavit, along with Desnoyers's RCU-protected extension in the > > userspace RCU library. > > I have carefully read the related sections of perfbook, Shalev and > Shavit's split-ordered lists, Michael's lock-free linked list, and > Desnoyers's implementation. What a wonderful journey. Thanks a lot for > the pointers! It is an interesting algorithm! One limitation is the need to reverse the bits of the key. This is sometimes (but not always) a performance bottleneck. > > But it is usually -way- better to just choose a good hash function and > > to increase the number of buckets. Which is of course one reason for > > having resizable hash tables. ;-) > > > > But the other techniques can be useful in more complex linked data > > structures, such as graphs, where there is no reasonable way to > > partition the data. Nevertheless, many people choose to do the > > partitioning anyway, especially on distributed systems. > > > > > === Question 3 === > > > Chapter Data Structures also discusses other resizable hash tables, > > > namely "Resizable, scalable, concurrent hash tables via relativistic > > > programming" from Josh Triplett, which can save memory footprint by > > > using a single pair of pointers. But my understanding is that > > > perftest_resize.c is unique in that it allows you to rebuild the hash > > > table by utilizing a different hash function, which could be very > > > useful in practice (e.g., to prevent DDoS attack). Other solutions do > > > not share this property. Is my understanding correct? Did I miss any > > > discussions on this topic in perfbook? > > > > Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach > > (which I use in hash_resize.c) is the only one allowing arbitrary changes > > to hash functions. I expect that this advantage will become increasingly > > important as security issues become more challenging. Furthermore, I > > suspect that the pointer-pair approach is faster and more scalable. > > It is certainly simpler. > > > > On the other hand, one advantage of the other two approaches is decreased > > memory consumption. > > Agree. My understanding is that the other two approaches trade > flexibility for memory efficiency. > > > Another advantage of Josh Triplett's pointer-unzip approach is that > > concurrent updates are (in theory, anyway) not blocked for as long > > by resize operations. The other edge of this sword is that resizing > > is much slower, given the need to wait for many RCU grace periods. > > I have carefully read Triplett's paper. The most impressing part is > that the algorithm gives up some degree of internal consistency (a > reader can see items that are really for other buckets) to get better > performance. That tradeoff is a common theme running through almost all RCU-based algorithms. > > Another advantage of Mathieu Desnoyers's RCUified variant of Shalev > > and Shavit's split-order list is that all operations are non-blocking, > > which can be important on massively overloaded systems, such as one > > might find in cloud computing. > > My understanding is that if Herbert Xu's algorithm leverages lock-free > linked lists, it could also become non-blocking. Is that right? The > biggest challenge may come from resizing thread, which must atomically > delete an item from the old table and insert it into the new table. > This could be very interesting. For the applications that may benefit > from the non-blocking feature of the hash tables, could you please > elaborate more? Actually, there is no need for the resizing thread to remove the element from the old list. (I was confused on that point myself earlier in this thread.) So you only need to atomically add and atomically delete. There is no need to atomically move. Thanx, Paul > Thanks, > --Junchang > > > > === Question 4 === > > > In the current implementation of hash_resize.c, the perftest_resize > > > could block an updater, and vice versa. It seems this is not what we > > > expected. Ideally, they should be allowed to run concurrently, or at > > > least the perftest_resize thread should have lower priority and > > > updaters should never be blocked by the perftest_resize thread. Is > > > that right? I'm very interested in helping improve. Please let me know > > > if you have any suggestions. > > > > In hash_resize.c, an updater is blocked only for the time required to > > redisposition a bucket. This is a great improvement over blocking > > updaters for the full resize over all buckets. > > > > But yes, it is not hard to do better, for example, periodically dropping > > the old-table lock in hashtab_resize(). This requires a few careful > > adjustments, of course. Can you tell me what these adjustments are? > > > > Hmmm... I could simplify hashtab_lookup(), couldn't I? After all, > > optimizing for the race with hashtab_resize() doesn't make a whole lot > > of sense. Please see the patch below. Thoughts? > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit 737646a9c868d841b32199b52f5569668975953e > > Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxx> > > Date: Mon Jan 7 10:29:14 2019 -0800 > > > > datastruct/hash: Simplify hashtab_lookup() > > > > Because resizing leaves the old hash table intact, and because lookups > > are carried out within RCU read-side critical sections (which prevent > > a second resizing operation from starting), there is no need for a > > lookup to search anywhere but in the old hash table. And in the common > > case, there is no resize, so there is no new hash table. Therefore, > > eliminating the check for resizing speeds things up in the common > > case. In addition, this simplifies the code. > > > > This commit therefore eliminates the ht_get_bucket() function, > > renames the ht_get_bucket_single() function to ht_get_bucket(), > > and modifies callers appropriately. > > > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxx> > > > > diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c > > index 29e05f907200..be4157959b83 100644 > > --- a/CodeSamples/datastruct/hash/hash_resize.c > > +++ b/CodeSamples/datastruct/hash/hash_resize.c > > @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master) > > //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$] > > /* Get hash bucket corresponding to key, ignoring the possibility of resize. */ > > static struct ht_bucket * //\lnlbl{single:b} > > -ht_get_bucket_single(struct ht *htp, void *key, long *b, > > - unsigned long *h) > > +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h) > > { > > unsigned long hash = htp->ht_gethash(key); > > > > @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b, > > *h = hash; //\lnlbl{single:h} > > return &htp->ht_bkt[*b]; //\lnlbl{single:return} > > } //\lnlbl{single:e} > > - > > -/* Get hash bucket correesponding to key, accounting for resize. */ > > -static struct ht_bucket * //\lnlbl{b} > > -ht_get_bucket(struct ht **htp, void *key, long *b, int *i) > > -{ > > - struct ht_bucket *htbp; > > - > > - htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single} > > - //\fcvexclude > > - if (*b <= READ_ONCE((*htp)->ht_resize_cur)) { //\lnlbl{resized} > > - smp_mb(); /* order ->ht_resize_cur before ->ht_new. */ > > - *htp = rcu_dereference((*htp)->ht_new); //\lnlbl{newtable} > > - htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket} > > - } > > - if (i) //\lnlbl{chk_i} > > - *i = (*htp)->ht_idx; //\lnlbl{set_idx} > > - return htbp; //\lnlbl{return} > > -} //\lnlbl{e} > > //\end{snippet} > > > > /* Read-side lock/unlock functions. */ > > @@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key, > > > > rcu_read_lock(); //\lnlbl{l:rcu_lock} > > htp = rcu_dereference(htp_master->ht_cur); //\lnlbl{l:refhashtbl} > > - htbp = ht_get_bucket_single(htp, key, &b, &h); //\lnlbl{l:refbucket} > > + htbp = ht_get_bucket(htp, key, &b, &h); //\lnlbl{l:refbucket} > > spin_lock(&htbp->htb_lock); //\lnlbl{l:acq_bucket} > > lsp->hbp[0] = htbp; //\lnlbl{l:lsp0b} > > lsp->hls_idx[0] = htp->ht_idx; > > @@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key, > > return; //\lnlbl{l:fastret1} > > } > > htp = rcu_dereference(htp->ht_new); //\lnlbl{l:new_hashtbl} > > - htbp = ht_get_bucket_single(htp, key, &b, &h); //\lnlbl{l:get_newbkt} > > + htbp = ht_get_bucket(htp, key, &b, &h); //\lnlbl{l:get_newbkt} > > spin_lock(&htbp->htb_lock); //\lnlbl{l:acq_newbkt} > > lsp->hbp[1] = htbp; //\lnlbl{l:lsp1b} > > lsp->hls_idx[1] = htp->ht_idx; > > @@ -223,16 +204,15 @@ struct ht_elem * //\lnlbl{lkp:b} > > hashtab_lookup(struct hashtab *htp_master, void *key) > > { > > long b; > > - int i; > > struct ht *htp; > > struct ht_elem *htep; > > struct ht_bucket *htbp; > > > > htp = rcu_dereference(htp_master->ht_cur); //\lnlbl{lkp:get_curtbl} > > - htbp = ht_get_bucket(&htp, key, &b, &i); //\lnlbl{lkp:get_curbkt} > > + htbp = ht_get_bucket(htp, key, &b, NULL); //\lnlbl{lkp:get_curbkt} > > cds_list_for_each_entry_rcu(htep, //\lnlbl{lkp:loop:b} > > &htbp->htb_head, > > - hte_next[i]) { > > + hte_next[htp->ht_idx]) { > > if (htp->ht_cmp(htep, key)) //\lnlbl{lkp:match} > > return htep; //\lnlbl{lkp:ret_match} > > } //\lnlbl{lkp:loop:e} > > @@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master, > > htbp = &htp->ht_bkt[i]; //\lnlbl{get_oldcur} > > spin_lock(&htbp->htb_lock); //\lnlbl{acq_oldcur} > > cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b} > > - htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL); > > + htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL); > > spin_lock(&htbp_new->htb_lock); > > cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head); > > spin_unlock(&htbp_new->htb_lock); > > diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex > > index 5c61bf5e2389..0152437c274e 100644 > > --- a/datastruct/datastruct.tex > > +++ b/datastruct/datastruct.tex > > @@ -966,10 +966,8 @@ the old table. > > \begin{lineref}[ln:datastruct:hash_resize:get_bucket] > > Bucket selection is shown in > > Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}, > > -which shows \co{ht_get_bucket_single()} on > > -lines~\lnref{single:b}-\lnref{single:e} and > > -\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}. > > -The \co{ht_get_bucket_single()} function returns a reference to the bucket > > +which shows \co{ht_get_bucket()}. > > +This function returns a reference to the bucket > > corresponding to the specified key in the specified hash table, without > > making any allowances for resizing. > > It also stores the bucket index corresponding to the key into the location > > @@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding > > hash value corresponding to the key into the location > > referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}. > > Line~\lnref{single:return} then returns a reference to the corresponding bucket. > > - > > -The \co{ht_get_bucket()} function handles hash-table selection, invoking > > -\co{ht_get_bucket_single()} on > > -line~\lnref{call_single} to select the bucket > > -corresponding to the hash in the current > > -hash table, storing the hash value through parameter~\co{b}. > > -If line~\lnref{resized} determines that the table is being resized and that > > -line~\lnref{call_single}'s bucket has already been distributed across the new hash > > -table, then line~\lnref{newtable} selects the new hash table and > > -line~\lnref{newbucket} > > -selects the bucket corresponding to the hash in the new hash table, > > -again storing the hash value through parameter~\co{b}. > > -\end{lineref} > > - > > -\QuickQuiz{} > > - The code in > > - Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection} > > - computes the hash twice! > > - Why this blatant inefficiency? > > -\QuickQuizAnswer{ > > - The reason is that the old and new hash tables might have > > - completely different hash functions, so that a hash computed > > - for the old table might be completely irrelevant to the > > - new table. > > -} \QuickQuizEnd > > - > > -\begin{lineref}[ln:datastruct:hash_resize:get_bucket] > > -If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then > > -line~\lnref{set_idx} stores the pointer-set index for the selected hash table. > > -Finally, line~\lnref{return} returns a reference to the selected hash bucket. > > \end{lineref} > > > > \QuickQuiz{} > > @@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket. > > functions described next. > > } \QuickQuizEnd > > > > -This implementation of > > -\co{ht_get_bucket_single()} and \co{ht_get_bucket()} > > -permit lookups and modifications to run concurrently > > -with a resize operation. > > +This implementation of \co{ht_get_bucket()} permits lookups and > > +modifications to run concurrently with a resize operation. > > > > \begin{listing}[tb] > > \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv} > > @@ -1129,11 +1095,6 @@ hash lookups. > > Line~\lnref{get_curtbl} fetches the current hash table and > > line~\lnref{get_curbkt} obtains a reference > > to the bucket corresponding to the specified key. > > -This bucket will be located in a new resized hash table when a > > -resize operation has progressed past the bucket in the old hash > > -table that contained the desired data element. > > -Note that line~\lnref{get_curbkt} also passes back the index that will be > > -used to select the correct set of pointers from the pair in each element. > > The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket, > > so that if line~\lnref{match} > > detects a match, > > @@ -1144,22 +1105,17 @@ failure. > > \end{lineref} > > > > \QuickQuiz{} > > - In the \co{hashtab_lookup()} function in > > - Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}, > > - the code carefully finds the right bucket in the new hash table > > - if the element to be looked up has already been distributed > > - by a concurrent resize operation. > > - This seems wasteful for RCU-protected lookups. > > - Why not just stick with the old hash table in this case? > > + \begin{lineref}[ln:datastruct:hash_resize:access:lkp] > > + What if execution reaches line~\lnref{loop:b} > > + of \co{hashtab_lookup()} in > > + Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions} > > + just after this bucket has been resized. > > + Won't that result in lookup failures? > > + \end{lineref} > > \QuickQuizAnswer{ > > - Suppose that a resize operation begins and distributes half of > > - the old table's buckets to the new table. > > - Suppose further that a thread adds a new element that goes into > > - one of the already-distributed buckets, and that this same thread > > - now looks up this newly added element. > > - If lookups unconditionally traversed only the old hash table, > > - this thread would get a lookup failure for the element that it > > - just added, which certainly sounds like a bug to me! > > + No, it won't. > > + Resizing into the new hash table leaves the old hash table > > + intact, courtesy of the pointer pairs. > > } \QuickQuizEnd > > > > \begin{lineref}[ln:datastruct:hash_resize:access:add] > > >