On 2018/12/31 13:03:07 -0800, Paul E. McKenney wrote: > On Tue, Jan 01, 2019 at 12:15:23AM +0900, Akira Yokosawa wrote: >> >From 52f5d218442eb64f2798335d56a1838f90d96d5f Mon Sep 17 00:00:00 2001 >> From: Akira Yokosawa <akiyks@xxxxxxxxx> >> Date: Mon, 30 Dec 2018 22:54:43 +0900 >> Subject: [PATCH] EXP hashtorture.h: Avoid sporadic SIGSEGV in hash_bkt_rcu >> >> Commit 4e22bdc905ff ("Wait at end of test for call_rcu() to finish") >> added a couple of synchronize_rcu()s in perftest_update() >> and zoo_reader(). >> >> However, there still remains sporadic SIGSEGV in >> >> $ ./hash_bkt_rcu --perftest --nupdaters 3 >> >> On the other hand, >> >> $ ./hash_bkt_rcu --schroedinger --nupdaters 3 >> >> does not show such issue. Just moving synchronize_rcu()s in >> zoo_reader() to zoo_updater() does not resolve the >> SIGSEGV. >> >> >> This commit defines rcu_barrier() if not available, >> and puts them at both before and after the final loop >> of perftest_updater() and zoo_updater(). >> >> It looks like this change can fix the above mentioned >> SIGSEGV in "--perftest". >> >> [Tested on Ubuntu Xenial with liburcu-dev/xenial,now 0.9.1-3 and >> liburcu4/xenial,now 0.9.1-3 installed.] >> >> NOTE: >> >> $ ./hash_resize --schroedinger --resizemult 2 --duration 20 > > I get SIGSEGV and hangs from time to time, so I am looking into this. > Thank you for calling it to my attention! I've found some suspicious code in hash_resize.c hashtab_lock_mod() takes care of ongoing resizing and spin_lock() new bucket if necessary. This is good for add, but for delete we may still need to lock old bucket. And hashtab_unlock_mod() doesn't care ongoing resizing, so there can be mismatch of spin_lock() -- spin_unlock(). Also, htp_master->ht_cur can change during the hashtab_lock_mod() -- hashtab_unlock_mod() critical section because the update of the pointer by rcu_assign_pointer() is ahead of synchronize_rcu(). Given the resizing is infrequent, the simplest way might be to block hashtab_lock_mod while resizing is going on. There can be a better way to keep concurrent add/del/resize, though. Happy hacking! ;-) Thanks, Akira > >> still fails with SIGSEGV frequently in zoo_del(). GDB says: >> >> (gdb) where >> #0 0x0000000000402b27 in cds_list_del_rcu (elem=0x7ff8fc0138f0) >> at /usr/include/urcu/rculist.h:71 >> #1 hashtab_del (htep=0x7ff8fc0138d0, htp_master=<optimized out>) >> at hash_resize.c:261 >> #2 zoo_del (zhep=0x7ff8fc0138d0) at hashtorture.h:1007 >> #3 zoo_updater (arg=0x1e8b298) at hashtorture.h:1153 >> #4 0x00007ff9057d16ba in start_thread (arg=0x7ff903fed700) >> at pthread_create.c:333 >> #5 0x00007ff9050f741d in clone () >> at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >> >> Signed-off-by: Akira Yokosawa <akiyks@xxxxxxxxx> > > Good catch, queue and pushed, thank you! > > With one small modification -- given that liburcu has had rcu_barrier() > for some years now, I removed the "training wheels" (and unreliable) > use of the wait and pair of synchronize_rcu() calls. > >> --- >> Hi Paul, >> >> This is a partial fix, but it resolves SIGSEGV in "--perftest" of >> hash_bkt_rcu and hash_resize. >> >> "--schroedinger" of hash_resize with resizing enabled still seg faults >> as mentioned in the commit log. >> >> By the way, what version of liburcu are you using? > > It is about two years old, but it does have rcu_barrier(). > > Thanx, Paul > >> Thanks, Akira >> -- >> CodeSamples/datastruct/hash/hashtorture.h | 24 ++++++++++++++++-------- >> 1 file changed, 16 insertions(+), 8 deletions(-) >> >> diff --git a/CodeSamples/datastruct/hash/hashtorture.h b/CodeSamples/datastruct/hash/hashtorture.h >> index 0e90220..9ae3dfa 100644 >> --- a/CodeSamples/datastruct/hash/hashtorture.h >> +++ b/CodeSamples/datastruct/hash/hashtorture.h >> @@ -55,6 +55,15 @@ void (*defer_del_done)(struct ht_elem *htep) = NULL; >> #ifndef quiescent_state >> #define quiescent_state() do ; while (0) >> #define synchronize_rcu() do ; while (0) >> +#define rcu_barrier() do ; while (0) >> +#else >> +#ifndef rcu_barrier >> +#define rcu_barrier() do { \ >> + synchronize_rcu(); \ >> + poll(NULL, 0, 100); \ >> + synchronize_rcu(); \ >> + } while (0) >> +#endif /* #ifndef rcu_barrier */ >> #endif /* #ifndef quiescent_state */ >> >> /* >> @@ -765,6 +774,7 @@ void *perftest_reader(void *arg) >> if (i >= ne) >> i = i % ne + offset; >> } >> + >> pap->nlookups = nlookups; >> pap->nlookupfails = nlookupfails; >> hash_unregister_thread(); >> @@ -839,6 +849,7 @@ void *perftest_updater(void *arg) >> quiescent_state(); >> } >> >> + rcu_barrier(); >> /* Test over, so remove all our elements from the hash table. */ >> for (i = 0; i < elperupdater; i++) { >> if (thep[i].in_table != 1) >> @@ -846,10 +857,7 @@ void *perftest_updater(void *arg) >> BUG_ON(!perftest_lookup(thep[i].data)); >> perftest_del(&thep[i]); >> } >> - /* Really want rcu_barrier(), but missing from old liburcu versions. */ >> - synchronize_rcu(); >> - poll(NULL, 0, 100); >> - synchronize_rcu(); >> + rcu_barrier(); >> >> hash_unregister_thread(); >> free(thep); >> @@ -1048,10 +1056,6 @@ void *zoo_reader(void *arg) >> if (i >= ne) >> i = i % ne + offset; >> } >> - /* Really want rcu_barrier(), but missing from old liburcu versions. */ >> - synchronize_rcu(); >> - poll(NULL, 0, 100); >> - synchronize_rcu(); >> >> pap->nlookups = nlookups; >> pap->nlookupfails = nlookupfails; >> @@ -1136,15 +1140,19 @@ void *zoo_updater(void *arg) >> quiescent_state(); >> } >> >> + rcu_barrier(); >> /* Test over, so remove all our elements from the hash table. */ >> for (i = 0; i < elperupdater; i++) { >> if (!zheplist[i]) >> continue; >> zoo_del(zheplist[i]); >> } >> + rcu_barrier(); >> + >> hash_unregister_thread(); >> pap->nadds = nadds; >> pap->ndels = ndels; >> + free(zheplist); >> return NULL; >> } >> >> -- >> 2.7.4 >> >> >