On Tue, Jan 08, 2019 at 08:06:51AM +0900, Akira Yokosawa wrote: > On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote: > > Hi Paul, > > > > On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote: > >> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote: > >>> Hi all, > >>> > >>> I'm reading hash_resize recently, and have a few questions regarding > >>> this algorithm. Please take a look if you have time. Any suggestions > >>> are warmly welcomed. > >>> > >>> === Question 1 === > >>> In hash_resize.c : hashtab_lock_mod > >>> 186 if (b > READ_ONCE(htp->ht_resize_cur)) { > >>> 187 lsp->hbp[1] = NULL; > >>> 188 return; > >>> 189 } > >>> 190 htp = rcu_dereference(htp->ht_new); > >>> > >>> It seems we are missing a barrier (e.g., smp_mb) in between lines 189 > >>> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent > >>> compilers and hardware from reordering the two unrelated variables, > >>> ht_resize_cur and ht_new. Is my understanding correct? > >> > >> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical > >> section > > > > You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts > > an RCU read-side critical section", don't you? > > > >> and there is a synchronize_rcu() between the update to ->ht_new > >> and the updates to ->ht_resize_cur. For more details on how this works, > >> please see https://lwn.net/Articles/573497/. > >> > >> Of course, if you find a code path in which a call to hashtab_lock_mod() > >> is invoked outside of an RCU read-side critical section, that would be > >> a bug. (Can you tell me an exception to this rule, that is, a case > >> where hashtab_lock_mod() could safely be invoked outside of an RCU > >> read-side critical section?) > >> > >>> === Question 2 === > >>> In hash_resize.c, each time an updater wants to access a bucket, the > >>> updater must first acquire the bucket's lock (htb_lock), preventing > >>> other updaters accessing the same bucket concurrently. This approach > >>> is OK if the linked list of a bucket is relatively short, but for a > >>> larger system where linked lists are long enough and the > >>> perftest_resize thread is running simultaneously, it could become a > >>> potential performance bottleneck. One naive solution is to allow > >>> multiple updaters to access the same bucket, only if they don't > >>> operate on the same item of the list of this bucket. I wonder if there > >>> are any existing works or discussions on this topic? > >> > >> One approach is to use a hashed array of locks, and to hash a given > >> element's address to locate the lock to be used. Please see > >> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire > >> Needed Locks First"), including Quick Quiz 7.9, for additional details. > >> > >> Another approach is to use RCU to protect traversals, and locks within the > >> linked-list elements themselves. These locks are conditionally acquired > >> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring > >> them in list order, and the tricks in Quick Quiz 7.9. > >> > >> Non-blocking synchronization can also be used, but it is often quite a > >> bit more complicated. See for example the split-order list of Shalev > >> and Shavit, along with Desnoyers's RCU-protected extension in the > >> userspace RCU library. > >> > >> But it is usually -way- better to just choose a good hash function and > >> to increase the number of buckets. Which is of course one reason for > >> having resizable hash tables. ;-) > >> > >> But the other techniques can be useful in more complex linked data > >> structures, such as graphs, where there is no reasonable way to > >> partition the data. Nevertheless, many people choose to do the > >> partitioning anyway, especially on distributed systems. > >> > >>> === Question 3 === > >>> Chapter Data Structures also discusses other resizable hash tables, > >>> namely "Resizable, scalable, concurrent hash tables via relativistic > >>> programming" from Josh Triplett, which can save memory footprint by > >>> using a single pair of pointers. But my understanding is that > >>> perftest_resize.c is unique in that it allows you to rebuild the hash > >>> table by utilizing a different hash function, which could be very > >>> useful in practice (e.g., to prevent DDoS attack). Other solutions do > >>> not share this property. Is my understanding correct? Did I miss any > >>> discussions on this topic in perfbook? > >> > >> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach > >> (which I use in hash_resize.c) is the only one allowing arbitrary changes > >> to hash functions. I expect that this advantage will become increasingly > >> important as security issues become more challenging. Furthermore, I > >> suspect that the pointer-pair approach is faster and more scalable. > >> It is certainly simpler. > >> > >> On the other hand, one advantage of the other two approaches is decreased > >> memory consumption. > >> > >> Another advantage of Josh Triplett's pointer-unzip approach is that > >> concurrent updates are (in theory, anyway) not blocked for as long > >> by resize operations. The other edge of this sword is that resizing > >> is much slower, given the need to wait for many RCU grace periods. > >> > >> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev > >> and Shavit's split-order list is that all operations are non-blocking, > >> which can be important on massively overloaded systems, such as one > >> might find in cloud computing. > >> > >>> === Question 4 === > >>> In the current implementation of hash_resize.c, the perftest_resize > >>> could block an updater, and vice versa. It seems this is not what we > >>> expected. Ideally, they should be allowed to run concurrently, or at > >>> least the perftest_resize thread should have lower priority and > >>> updaters should never be blocked by the perftest_resize thread. Is > >>> that right? I'm very interested in helping improve. Please let me know > >>> if you have any suggestions. > >> > >> In hash_resize.c, an updater is blocked only for the time required to > >> redisposition a bucket. This is a great improvement over blocking > >> updaters for the full resize over all buckets. > >> > >> But yes, it is not hard to do better, for example, periodically dropping > >> the old-table lock in hashtab_resize(). This requires a few careful > >> adjustments, of course. Can you tell me what these adjustments are? > >> > >> Hmmm... I could simplify hashtab_lookup(), couldn't I? After all, > >> optimizing for the race with hashtab_resize() doesn't make a whole lot > >> of sense. Please see the patch below. Thoughts? > >> > >> Thanx, Paul > >> > >> ------------------------------------------------------------------------ > >> > >> commit 737646a9c868d841b32199b52f5569668975953e > >> Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxx> > >> Date: Mon Jan 7 10:29:14 2019 -0800 > >> > >> datastruct/hash: Simplify hashtab_lookup() > >> > >> Because resizing leaves the old hash table intact, and because lookups > >> are carried out within RCU read-side critical sections (which prevent > >> a second resizing operation from starting), there is no need for a > >> lookup to search anywhere but in the old hash table. And in the common > >> case, there is no resize, so there is no new hash table. Therefore, > >> eliminating the check for resizing speeds things up in the common > >> case. In addition, this simplifies the code. > >> > >> This commit therefore eliminates the ht_get_bucket() function, > >> renames the ht_get_bucket_single() function to ht_get_bucket(), > >> and modifies callers appropriately. > >> > >> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxx> > >> > >> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c > >> index 29e05f907200..be4157959b83 100644 > >> --- a/CodeSamples/datastruct/hash/hash_resize.c > >> +++ b/CodeSamples/datastruct/hash/hash_resize.c > >> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master) > >> //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$] > >> /* Get hash bucket corresponding to key, ignoring the possibility of resize. */ > >> static struct ht_bucket * //\lnlbl{single:b} > >> -ht_get_bucket_single(struct ht *htp, void *key, long *b, > >> - unsigned long *h) > >> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h) > >> { > >> unsigned long hash = htp->ht_gethash(key); > >> > >> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b, > >> *h = hash; //\lnlbl{single:h} > >> return &htp->ht_bkt[*b]; //\lnlbl{single:return} > >> } //\lnlbl{single:e} > >> - > >> -/* Get hash bucket correesponding to key, accounting for resize. */ > >> -static struct ht_bucket * //\lnlbl{b} > >> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i) > >> -{ > >> - struct ht_bucket *htbp; > >> - > >> - htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single} > >> - //\fcvexclude > >> - if (*b <= READ_ONCE((*htp)->ht_resize_cur)) { //\lnlbl{resized} > >> - smp_mb(); /* order ->ht_resize_cur before ->ht_new. */ > > > > If we can remove this memory barrier, the counterpart smp_mb() in > > hashtab_resize() becomes unnecessary, doesn't it? > > And the WRITE_ONCE() in the following line. Actually, that must stay. It is true that the bucket lock is held by hashtab_unlock_mod(), and that this prevents concurrent resizing of that bucket, but other buckets might well be resized, which results in the possibiliity of concurrent reads and writes for ->ht_resize_cur. Anyway, here is the resulting commit. Thoughts? Thanx, Paul ------------------------------------------------------------------------ commit 1aac0c703482c90c2ce4092b2cc604d474f5a44b Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxx> Date: Mon Jan 7 15:39:40 2019 -0800 datastruct/hash: Remove extraneous barrier from hashtab_resize() Now that hashtab_lookup() is iresizing-agnostic, all non-initialization accesses to ->ht_resize-cur are protected by locking in the restricted sense that any change to ->ht_resize_cur that would change the value of the "if" condition cannot happen while the lock is held on the old bucket. This means that the memory barrier may be removed. However, the READ_ONCE() and WRITE_ONCE() markings on non-initialization accesses to ->ht_resize_cur must remain because reads from ->ht_resize_cur really can race with writes, just not is a way to change the "if" conditions. Reported-by: Akira Yokosawa <akiyks@xxxxxxxxx> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxx> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c index be4157959b83..9f68a00dabe3 100644 --- a/CodeSamples/datastruct/hash/hash_resize.c +++ b/CodeSamples/datastruct/hash/hash_resize.c @@ -288,7 +288,6 @@ int hashtab_resize(struct hashtab *htp_master, cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head); spin_unlock(&htbp_new->htb_lock); } //\lnlbl{loop_list:e} - smp_mb(); /* Fill new buckets before claiming them. */ WRITE_ONCE(htp->ht_resize_cur, i); //\lnlbl{update_resize} spin_unlock(&htbp->htb_lock); //\lnlbl{rel_oldcur} } //\lnlbl{loop:e} diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex index 0152437c274e..e2159330790f 100644 --- a/datastruct/datastruct.tex +++ b/datastruct/datastruct.tex @@ -1245,10 +1245,7 @@ lines~\lnref{loop_list:b}-\lnref{loop_list:e} adds one data element from the current old-table bucket to the corresponding new-table bucket, holding the new-table bucket's lock during the add operation. Line~\lnref{update_resize} updates -\co{->ht_resize_cur} to indicate that this bucket has been distributed, -and is preceded by a full memory barrier that pairs with the one in -\co{ht_get_bucket()} shown in -Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}. +\co{->ht_resize_cur} to indicate that this bucket has been distributed. Finally, line~\lnref{rel_oldcur} releases the old-table bucket lock. \QuickQuiz{}