Re: Question regarding hash_resize

Akira Yokosawa <akiyks@xxxxxxxxx> · Sat, 12 Jan 2019 07:56:48 +0900

On 2019/01/11 07:43:14 -0800, Paul E. McKenney wrote:
> On Fri, Jan 11, 2019 at 11:25:18PM +0900, Akira Yokosawa wrote:
>> On 2019/01/11 13:08, Paul E. McKenney wrote:
>>> On Tue, Jan 08, 2019 at 06:59:13PM -0800, Paul E. McKenney wrote:
>>>> On Tue, Jan 08, 2019 at 04:19:59PM -0800, Paul E. McKenney wrote:
>>>>> On Wed, Jan 09, 2019 at 07:16:05AM +0900, Akira Yokosawa wrote:
>>>>>> On 2019/01/08 10:39:31 -0800, Paul E. McKenney wrote:
>>>>>>> On Wed, Jan 09, 2019 at 12:35:37AM +0900, Akira Yokosawa wrote:
>>>>>>>> On 2019/01/09 0:28, Paul E. McKenney wrote:
>>>>>>>>> On Tue, Jan 08, 2019 at 09:56:57AM +0800, Junchang Wang wrote:
>>>>>>>>>> On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@xxxxxxxxx> wrote:
>>>>>>>>>>> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
>>>>>
>>>>> [ . . . ]
>>>>>
>>>>>>>>>> Hi Paul and Akira,
>>>>>>>>>>
>>>>>>>>>> Thanks a lot for the comments, which I need some more time to look
>>>>>>>>>> into. For Paul's patch, I have a few concerns. Please take a look.
>>>>>>>>>>
>>>>>>>>>> My understanding is that with this path, during the time period when
>>>>>>>>>> the resizing thread is running, an updater may insert/delete an item
>>>>>>>>>> into/from the new hash table, while readers are still looking up data
>>>>>>>>>> in the old one, resulting the readers are unaware of
>>>>>>>>>> insertions/deletions happening simultaneously. For example, it seems
>>>>>>>>>> the following sequence could happen.
>>>>>>>>>>
>>>>>>>>>> 1. The resizing thread starts.
>>>>>>>>>> 2. The resizing thread successfully passes bucket *B* of the old hash table.
>>>>>>>>>> 3. An updater wants to insert a new item *I* which should be inserted
>>>>>>>>>> into bucket *B*.
>>>>>>>>>> 4. The updater will select the new hash table and insert the item *I*
>>>>>>>>>> into the new hash table.
>>>>>>>>>> 5. A read request comes in and wants to lookup item *I*. The lookup
>>>>>>>>>> request will check the old hash table and fail. Doesn't it?
>>>>>>>>>> 6. The resizing thread exits.
>>>>>>>>>> 7. Now subsequent read requests can successfully find item *I*.
>>>>>>>>>
>>>>>>>>> Yes, this can happen.
>>>>>>>>>
>>>>>>>>>> Is my understanding correct? Please let me know if I misunderstood
>>>>>>>>>> anything. Give the truth that this patch can accelerate the fast path,
>>>>>>>>>> I think it should be OK because resizing is typically happen rarely.
>>>>>>>>>> Just want to make sure I fully understand the algorithm.
>>>>>>>>>
>>>>>>>>> It is a design choice, and some users would prefer not to fail to see
>>>>>>>>> new items during a resize.  One approach would be to revert back to
>>>>>>>>> the old-style checking, and another would be to provide a separate
>>>>>>>>> lookup interface that synchronizes with adds and deletes.
>>>>>>>>>
>>>>>>>>> So, I could add a quick quiz with this information, I could revert the
>>>>>>>>> change, or I could add another lookup function that provided more timely
>>>>>>>>> information.  Left to myself, I would provide a quick quiz, but what
>>>>>>>>> do you guys think?
>>>>>>>>
>>>>>>>> Hi, I was composing a message, but now I'm replying to this one.
>>>>>>>> I think adding a quick quiz would be a good idea.
>>>>>>>
>>>>>>> But in the meantime, it occurred to me that I was looking at the
>>>>>>> problem in the wrong way.  I believe that the following patch makes
>>>>>>> hashtab_lookup() find elements recently added by hashtab_add(), even
>>>>>>> during a resize, and without the need for memory barriers.
>>>>>>>
>>>>>>> The scenario that convinced me to take this approach is when a thread
>>>>>>> does hashtab_add(), then immediately searches for the newly added element.
>>>>>>> Failing to find it would be quite a surprise to most people.
>>>>>>
>>>>>> When a thread does hashtab_del() and immediately checks the deletion,
>>>>>> it still finds the deleted element while resizing is in progress.
>>>>>> This would also be a surprise. Current version looks less consistent
>>>>>> than the simpler one did.
>>>>>
>>>>> I bet I can fix that...  Famous last words!  ;-)
>>>>>
>>>>> But please see below and tell me what you think.
>>>>
>>>> Well, that is not quite right, but close.  Working on it...
>>>
>>> Seems to be stable.  I have not yet updated the text.  I am currently
>>> looking into whether I can get rid of ->ht_resize_cur.
>>
>> Without ->ht_resize_cur, it would be hard (or impossible) for
>> hashtab_lock_mod() to see if bucket in the new table needs
>> to be locked, wouldn't it?
> 
> The thought is to just unconditionally lock the buckets in both tables
> when resizing.  This would of course result in added locking overhead
> for adds and deletes, but only during resizing.
> 
>> I have another idea to simplify the code, with the possibility of
>> increasing update-side cost during resizing.
>>
>> By keeping updating ->cur hashtab until the resizing finished,
>> hashtab_lookup() only needs to see ->cur.
>> hashtab_add() and hashtab_del() update both buckets if hashtab_lock_mod()
>> has locked two buckets.
> 
> Now that hashtab_lookup() does a lookup in both buckets, it should not be
> necessary to add to both buckets.  But you are right that it should not
> hurt, given that there is no API for iterating over the entire hash table.
> And I do see how it reduces the non-resize-time cost of failed lookups,
> which is a good thing.
> 
> So this is a promising approach!  One question interspersed below.
> 
> 							Thanx, Paul
> 
>> I've added the code as the alternative code to avoid messing line lables.
>> The diff is appended below.
>>
>> Thoughts?
>>
>> Note: By adding a "-DFCV_SNIPPET" flag to the compiler, you can compile
>> your version of the code.
>>
>>         Thanks, Akira
>>
>>>                                                        In theory, this
>>> would make it trivial to make the resizing "pause", releasing the lock
>>> from time to time.
>>>
>>> For whatever it is worth...
>>>
>>> 							Thanx, Paul
>>>
>>
>> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
>> index 9f9fe8c..35bc847 100644
>> --- a/CodeSamples/datastruct/hash/hash_resize.c
>> +++ b/CodeSamples/datastruct/hash/hash_resize.c
>> @@ -30,7 +30,11 @@
>>  struct ht_elem {
>>  	struct rcu_head rh;
>>  	struct cds_list_head hte_next[2];		//\lnlbl{ht_elem:next}
>> +#ifndef FCV_SNIPPET
>> +	unsigned long hte_hash[2];
>> +#else /* #ifndef FCV_SNIPPET */
>>  	unsigned long hte_hash;
>> +#endif /* #ifndef FCV_SNIPPET */
>>  };
>>  
>>  /* Hash-table bucket element. */
>> @@ -230,12 +234,16 @@ hashtab_lookup(struct hashtab *htp_master, void *key)
>>  
>>  	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
>>  	htep = ht_search_bucket(htp, key);		//\lnlbl{lkp:get_curbkt}
>> +#ifndef FCV_SNIPPET
>> +	return htep;
>> +#else /* #ifndef FCV_SNIPPET */
>>  	if (htep)					//\lnlbl{lkp:entchk}
>>  		return htep;				//\lnlbl{lkp:ret_match}
>>  	htp = rcu_dereference(htp->ht_new);		//\lnlbl{lkp:get_nxttbl}
>>  	if (!htp)					//\lnlbl{lkp:htpchk}
>>  		return NULL;				//\lnlbl{lkp:noresize}
>>  	return ht_search_bucket(htp, key);		//\lnlbl{lkp:ret_nxtbkt}
>> +#endif /* #ifndef FCV_SNIPPET */
>>  }							//\lnlbl{lkp:e}
>>  
>>  /*
>> @@ -248,9 +256,18 @@ void hashtab_add(struct ht_elem *htep,			//\lnlbl{add:b}
>>  	struct ht_bucket *htbp = lsp->hbp[0];		//\lnlbl{add:htbp}
>>  	int i = lsp->hls_idx[0];			//\lnlbl{add:i}
>>  
>> +#ifndef FCV_SNIPPET
>> +	htep->hte_hash[0] = lsp->hls_hash[0];
>> +	cds_list_add_rcu(&htep->hte_next[i], &htbp->htb_head);
>> +	if ((htbp = lsp->hbp[1])) {
>> +	  htep->hte_hash[1] = lsp->hls_hash[1];
>> +	  cds_list_add_rcu(&htep->hte_next[!i], &htbp->htb_head);
>> +	}
>> +#else /* #ifndef FCV_SNIPPET */
>>  	htep->hte_hash = lsp->hls_hash[0];		//\lnlbl{add:hash}
>>  	htep->hte_next[!i].prev = NULL;			//\lnlbl{add:initp}
>>  	cds_list_add_rcu(&htep->hte_next[i], &htbp->htb_head); //\lnlbl{add:add}
>> +#endif /* #ifndef FCV_SNIPPET */
>>  }							//\lnlbl{add:e}
>>  
>>  /*
>> @@ -262,6 +279,11 @@ void hashtab_del(struct ht_elem *htep,			//\lnlbl{del:b}
>>  {
>>  	int i = lsp->hls_idx[0];			//\lnlbl{del:i}
>>  
>> +#ifndef FCV_SNIPPET
>> +	cds_list_del_rcu(&htep->hte_next[i]);
>> +	if (lsp->hbp[1])
>> +	  cds_list_del_rcu(&htep->hte_next[!i]);
> 
> What if the element was added before the resize started, so that it is not
> in the new version of the table?  Or are we initializing each element's
> pointers to point to themselves somewhere that I am currently blind to?

You are considering unconditionally locking both buckets, but
this version relies on the update of ->ht_resize_cur.

That hashtab_lock_mod() returns a pointer in lsp->hbp[1] means
the resizing has distributed the bucket to new table. Therefore,
the element is assured to be in the new bucket.

And it looks like I used indent by 2 spaces in the added code
in the diff. I'll fix them if you'd like to take this approach.

        Thanks, Akira

> 
>> +#else /* #ifndef FCV_SNIPPET */
>>  	if (htep->hte_next[i].prev) {			//\lnlbl{del:if}
>>  		cds_list_del_rcu(&htep->hte_next[i]);	//\lnlbl{del:del}
>>  		htep->hte_next[i].prev = NULL;		//\lnlbl{del:init}
>> @@ -270,6 +292,7 @@ void hashtab_del(struct ht_elem *htep,			//\lnlbl{del:b}
>>  		cds_list_del_rcu(&htep->hte_next[!i]);	//\lnlbl{del:delnew}
>>  		htep->hte_next[!i].prev = NULL;		//\lnlbl{del:initnew}
>>  	}
>> +#endif /* #ifndef FCV_SNIPPET */
>>  }							//\lnlbl{del:e}
>>  //\end{snippet}
>>  
>> @@ -350,5 +373,9 @@ void defer_del_rcu(struct rcu_head *rhp)
>>  
>>  #define quiescent_state() rcu_quiescent_state()
>>  
>> +#ifndef FCV_SNIPPET
>> +#define check_hash() (htep->hte_hash[0] != hash && htep->hte_hash[1] != hash)
>> +#endif /* #ifndef FCV_SNIPPET */
>> +
>>  #include "hashtorture.h"
>>  #endif /* #ifdef TEST_HASH */
>> diff --git a/CodeSamples/datastruct/hash/hashtorture.h b/CodeSamples/datastruct/hash/hashtorture.h
>> index 6f47baa..d6345cc 100644
>> --- a/CodeSamples/datastruct/hash/hashtorture.h
>> +++ b/CodeSamples/datastruct/hash/hashtorture.h
>> @@ -62,6 +62,10 @@ void (*defer_del_done)(struct ht_elem *htep) = NULL;
>>  #define rcu_barrier() do ; while (0)
>>  #endif /* #ifndef quiescent_state */
>>  
>> +#ifndef check_hash
>> +#define check_hash() (htep->hte_hash != hash)
>> +#endif /* #ifndef check_hash */
>> +
>>  /*
>>   * Test variables.
>>   */
>> @@ -988,7 +992,7 @@ int zoo_lookup(char *key)
>>  	htep = hashtab_lookup(perftest_htp, hash, key);
>>  	zhep = container_of(htep, struct zoo_he, zhe_e);
>>  	BUG_ON(htep &&
>> -	       (htep->hte_hash != hash ||
>> +	       (check_hash() ||
>>  	        strncmp(zhep->name, (char *)key, ZOO_NAMELEN) != 0));
>>  	hashtab_unlock_lookup(perftest_htp, hash);
>>  	hashtab_lookup_done(htep);
>>
>>
>