Re: [PATCH] EXP hashtorture.h: Avoid sporadic SIGSEGV in hash_bkt_rcu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019/01/01 10:00:25 -0800, Paul E. McKenney wrote:
> On Tue, Jan 01, 2019 at 09:27:41AM +0900, Akira Yokosawa wrote:
>> On 2018/12/31 13:03:07 -0800, Paul E. McKenney wrote:
>>> On Tue, Jan 01, 2019 at 12:15:23AM +0900, Akira Yokosawa wrote:
>>>> >From 52f5d218442eb64f2798335d56a1838f90d96d5f Mon Sep 17 00:00:00 2001
>>>> From: Akira Yokosawa <akiyks@xxxxxxxxx>
>>>> Date: Mon, 30 Dec 2018 22:54:43 +0900
>>>> Subject: [PATCH] EXP hashtorture.h: Avoid sporadic SIGSEGV in hash_bkt_rcu
>>>>
>>>> Commit 4e22bdc905ff ("Wait at end of test for call_rcu() to finish")
>>>> added a couple of synchronize_rcu()s in perftest_update()
>>>> and zoo_reader().
>>>>
>>>> However, there still remains sporadic SIGSEGV in
>>>>
>>>>     $ ./hash_bkt_rcu --perftest --nupdaters 3
>>>>
>>>> On the other hand,
>>>>
>>>>     $ ./hash_bkt_rcu --schroedinger --nupdaters 3
>>>>
>>>> does not show such issue. Just moving synchronize_rcu()s in
>>>> zoo_reader() to zoo_updater() does not resolve the
>>>> SIGSEGV.
>>>>
>>>>
>>>> This commit defines rcu_barrier() if not available,
>>>> and puts them at both before and after the final loop
>>>> of perftest_updater() and zoo_updater().
>>>>
>>>> It looks like this change can fix the above mentioned
>>>> SIGSEGV in "--perftest".
>>>>
>>>> [Tested on Ubuntu Xenial with liburcu-dev/xenial,now 0.9.1-3 and
>>>> liburcu4/xenial,now 0.9.1-3 installed.]
>>>>
>>>> NOTE:
>>>>
>>>>     $ ./hash_resize --schroedinger --resizemult 2 --duration 20
>>>
>>> I get SIGSEGV and hangs from time to time, so I am looking into this.
>>> Thank you for calling it to my attention!
>>
>> I've found some suspicious code in hash_resize.c
>>
>> hashtab_lock_mod() takes care of ongoing resizing and spin_lock()
>> new bucket if necessary. This is good for add, but for delete
>> we may still need to lock old bucket.
>>
>> And hashtab_unlock_mod() doesn't care ongoing resizing, so
>> there can be mismatch of spin_lock() -- spin_unlock().
>>
>> Also, htp_master->ht_cur can change during the
>> hashtab_lock_mod() -- hashtab_unlock_mod() critical section
>> because the update of the pointer by rcu_assign_pointer()
>> is ahead of synchronize_rcu().
>>
>> Given the resizing is infrequent, the simplest way might be to
>> block hashtab_lock_mod while resizing is going on.
> 
> I do believe you have found something here, and thank you!  So the
> answer to my earlier question as to whether I was smarter when writing
> it than now is clearly that I was equally stupid in both cases.  ;-)
> 
> Well, it is conference-driven code, but still high time for me to
> clean it up.
> 
>> There can be a better way to keep concurrent add/del/resize, though.
>> Happy hacking! ;-) 
> 
> I do believe that I can preserve concurrency between resizing and
> deletion, but that is clearly for me to prove.

There is one more thing I've noticed with "hash_resize --schroedinger".
*Without* resizing enabled, it says:

    $ ./hash_resize --schroedinger
    nlookups: 91373 91373  ncats: 0  nadds: 5  ndels: 6  duration: 10.851
    ns/read: 118.755  ns/update: 986455

This means that all the lookups failed. OTOH, hash_bkt_rcu works as expected
as follows:

    $ ./hash_bkt_rcu --schroedinger
    nlookups: 56064 28004  ncats: 0  nadds: 5  ndels: 5  duration: 10.373
    ns/read: 185.021  ns/update: 1.0373e+06

(ns/read looks slow because compiler optimization is disabled.)

There seems to be some mismatch in hash/key handling of hash_resize.c --
hashtorture.h combination. I've not yet figured out the cause, though.

        Thanks, Akira

> 
> And thank you again!
> 
> 							Thanx, Paul
> 
>>         Thanks, Akira
>>>
>>>> still fails with SIGSEGV frequently in zoo_del(). GDB says:
>>>>
>>>>     (gdb) where
>>>>     #0  0x0000000000402b27 in cds_list_del_rcu (elem=0x7ff8fc0138f0)
>>>>         at /usr/include/urcu/rculist.h:71
>>>>     #1  hashtab_del (htep=0x7ff8fc0138d0, htp_master=<optimized out>)
>>>>         at hash_resize.c:261
>>>>     #2  zoo_del (zhep=0x7ff8fc0138d0) at hashtorture.h:1007
>>>>     #3  zoo_updater (arg=0x1e8b298) at hashtorture.h:1153
>>>>     #4  0x00007ff9057d16ba in start_thread (arg=0x7ff903fed700)
>>>>         at pthread_create.c:333
>>>>     #5  0x00007ff9050f741d in clone ()
>>>>         at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>
>>>> Signed-off-by: Akira Yokosawa <akiyks@xxxxxxxxx>
>>>
>>> Good catch, queue and pushed, thank you!
>>>
>>> With one small modification -- given that liburcu has had rcu_barrier()
>>> for some years now, I removed the "training wheels" (and unreliable)
>>> use of the wait and pair of synchronize_rcu() calls.
>>>
>>>> ---
>>>> Hi Paul,
>>>>
>>>> This is a partial fix, but it resolves SIGSEGV in "--perftest" of
>>>> hash_bkt_rcu and hash_resize.
>>>>
>>>> "--schroedinger" of hash_resize with resizing enabled still seg faults
>>>> as mentioned in the commit log.
>>>>
>>>> By the way, what version of liburcu are you using?
>>>
>>> It is about two years old, but it does have rcu_barrier().
>>>
>>> 								Thanx, Paul
>>>
>>>>         Thanks, Akira
>>>> --
>>>>  CodeSamples/datastruct/hash/hashtorture.h | 24 ++++++++++++++++--------
>>>>  1 file changed, 16 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/CodeSamples/datastruct/hash/hashtorture.h b/CodeSamples/datastruct/hash/hashtorture.h
>>>> index 0e90220..9ae3dfa 100644
>>>> --- a/CodeSamples/datastruct/hash/hashtorture.h
>>>> +++ b/CodeSamples/datastruct/hash/hashtorture.h
>>>> @@ -55,6 +55,15 @@ void (*defer_del_done)(struct ht_elem *htep) = NULL;
>>>>  #ifndef quiescent_state
>>>>  #define quiescent_state() do ; while (0)
>>>>  #define synchronize_rcu() do ; while (0)
>>>> +#define rcu_barrier() do ; while (0)
>>>> +#else
>>>> +#ifndef rcu_barrier
>>>> +#define rcu_barrier() do { \
>>>> +		synchronize_rcu(); \
>>>> +		poll(NULL, 0, 100); \
>>>> +		synchronize_rcu(); \
>>>> +	} while (0)
>>>> +#endif /* #ifndef rcu_barrier */
>>>>  #endif /* #ifndef quiescent_state */
>>>>  
>>>>  /*
>>>> @@ -765,6 +774,7 @@ void *perftest_reader(void *arg)
>>>>  		if (i >= ne)
>>>>  			i = i % ne + offset;
>>>>  	}
>>>> +
>>>>  	pap->nlookups = nlookups;
>>>>  	pap->nlookupfails = nlookupfails;
>>>>  	hash_unregister_thread();
>>>> @@ -839,6 +849,7 @@ void *perftest_updater(void *arg)
>>>>  			quiescent_state();
>>>>  	}
>>>>  
>>>> +	rcu_barrier();
>>>>  	/* Test over, so remove all our elements from the hash table. */
>>>>  	for (i = 0; i < elperupdater; i++) {
>>>>  		if (thep[i].in_table != 1)
>>>> @@ -846,10 +857,7 @@ void *perftest_updater(void *arg)
>>>>  		BUG_ON(!perftest_lookup(thep[i].data));
>>>>  		perftest_del(&thep[i]);
>>>>  	}
>>>> -	/* Really want rcu_barrier(), but missing from old liburcu versions. */
>>>> -	synchronize_rcu();
>>>> -	poll(NULL, 0, 100);
>>>> -	synchronize_rcu();
>>>> +	rcu_barrier();
>>>>  
>>>>  	hash_unregister_thread();
>>>>  	free(thep);
>>>> @@ -1048,10 +1056,6 @@ void *zoo_reader(void *arg)
>>>>  		if (i >= ne)
>>>>  			i = i % ne + offset;
>>>>  	}
>>>> -	/* Really want rcu_barrier(), but missing from old liburcu versions. */
>>>> -	synchronize_rcu();
>>>> -	poll(NULL, 0, 100);
>>>> -	synchronize_rcu();
>>>>  
>>>>  	pap->nlookups = nlookups;
>>>>  	pap->nlookupfails = nlookupfails;
>>>> @@ -1136,15 +1140,19 @@ void *zoo_updater(void *arg)
>>>>  			quiescent_state();
>>>>  	}
>>>>  
>>>> +	rcu_barrier();
>>>>  	/* Test over, so remove all our elements from the hash table. */
>>>>  	for (i = 0; i < elperupdater; i++) {
>>>>  		if (!zheplist[i])
>>>>  			continue;
>>>>  		zoo_del(zheplist[i]);
>>>>  	}
>>>> +	rcu_barrier();
>>>> +
>>>>  	hash_unregister_thread();
>>>>  	pap->nadds = nadds;
>>>>  	pap->ndels = ndels;
>>>> +	free(zheplist);
>>>>  	return NULL;
>>>>  }
>>>>  
>>>> -- 
>>>> 2.7.4
>>>>
>>>>
>>>
>>
> 




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux