Hi,
The env is not there anymore, but I have collected the thread stack trace of glusterd with command
“thread apply all bt”
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f9ee9fcfa3d in __pthread_timedjoin_ex () from /lib64/libpthread.so.0
Missing separate debuginfos, use: dnf debuginfo-install rcp-pack-glusterfs-1.12.0_1_gc999db1-RCP2.wf29.x86_64
(gdb) thread apply all bt
Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
#0 0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2 0x00007f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3 0x00007f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) at rpc-transport.c:123
#4 0x00007f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, notify_handled=_gf_true) at socket.c:2322
#5 0x00007f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, data="" poll_in=1, poll_out=0,
poll_err=0) at socket.c:2471
#6 0x00007f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583
#7 0x00007f9eeb2828ab in event_dispatch_epoll_worker (data="" at event-epoll.c:659
#8 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#9 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7f9edffff700 (LWP 1932)):
#0 0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f9ee9fd2b42 in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
#2 0x00007f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#3 0x00007f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0, gen=4, idx=27) at socket.c:1201
#4 0x00007f9ee4fbf99c in socket_event_handler (fd=36, idx=27, gen=4, data="" poll_in=1, poll_out=0,
poll_err=0) at socket.c:2480
#5 0x00007f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, event=0x7f9edfffee84) at event-epoll.c:583
#6 0x00007f9eeb2828ab in event_dispatch_epoll_worker (data="" at event-epoll.c:659
#7 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#8 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f9ee49b3700 (LWP 1931)):
#0 0x00007f9ee9fd45bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9ee5e651b9 in hooks_worker (args=0x1813000) at glusterd-hooks.c:529
#2 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#3 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f9ee692e700 (LWP 1762)):
#0 0x00007f9ee9fd497a in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9eeb25d904 in syncenv_task (proc=0x1808e00) at syncop.c:603
#2 0x00007f9eeb25db9f in syncenv_processor (thdata=0x1808e00) at syncop.c:695
#3 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f9ee712f700 (LWP 1761)):
---Type <return> to continue, or q <return> to quit---
#0 0x00007f9ee9fd497a in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f9eeb25d904 in syncenv_task (proc=0x1808a40) at syncop.c:603
#2 0x00007f9eeb25db9f in syncenv_processor (thdata=0x1808a40) at syncop.c:695
#3 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f9ee7930700 (LWP 1760)):
#0 0x00007f9ee98725d0 in nanosleep () from /lib64/libc.so.6
#1 0x00007f9ee98724aa in sleep () from /lib64/libc.so.6
#2 0x00007f9eeb247fdf in pool_sweeper (arg=0x0) at mem-pool.c:481
#3 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f9ee8131700 (LWP 1759)):
#0 0x00007f9ee97e3d7c in sigtimedwait () from /lib64/libc.so.6
#1 0x00007f9ee9fd8bac in sigwait () from /lib64/libpthread.so.0
#2 0x0000000000409ed7 in glusterfs_sigwaiter ()
#3 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#4 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f9ee8932700 (LWP 1758)):
#0 0x00007f9ee9fd83b0 in nanosleep () from /lib64/libpthread.so.0
#1 0x00007f9eeb224545 in gf_timer_proc (data="" at timer.c:164
#2 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#3 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f9eeb707780 (LWP 1757)):
#0 0x00007f9ee9fcfa3d in __pthread_timedjoin_ex () from /lib64/libpthread.so.0
#1 0x00007f9eeb282b09 in event_dispatch_epoll (event_pool=0x17feb00) at event-epoll.c:746
#2 0x00007f9eeb246786 in event_dispatch (event_pool=0x17feb00) at event.c:124
#3 0x000000000040ab95 in main ()
(gdb)
(gdb)
(gdb) q!
A syntax error in _expression_, near `'.
(gdb) quit
From: Sanju
Rakonde <srakonde@xxxxxxxxxx>
Sent: Monday, April 08, 2019 4:58 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou@xxxxxxxxxxxxxxx>
Cc: Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx>;
gluster-devel@xxxxxxxxxxx
Subject: Re: [Gluster-devel] glusterd stuck for glusterfs with version 3.12.15
Can you please capture output of "pstack $(pidof glusterd)" and send it to us? We need to capture this information when glusterd is struck.
Hi glusterfs experts,
Good day!
In my test env, sometimes glusterd stuck issue happened, and it is not responding to any gluster commands, when I checked this issue I find that glusterd thread
9 and thread 8 is dealing with the same socket, I thought following patch should be able to solve this issue, however after I merged this patch this issue still exist. When I looked into this code, it seems socket_event_poll_in called event_handled before
rpc_transport_pollin_destroy, I think this gives the chance for another poll for the exactly the same socket. And caused this glusterd stuck issue, also, I find there is no LOCK_DESTROY(&iobref->lock)
In iobref_destroy, I think it is better to add destroy lock.
Following is the gdb info when this issue happened, I would like to know your opinion on this issue, thanks!
SHA-1: f747d55a7fd364e2b9a74fe40360ab3cb7b11537
* socket: fix issue on concurrent handle of a socket
GDB INFO:
Thread 8 is blocked on pthread_cond_wait, and thread 9 is blocked in iobref_unref, I think
Thread 9 (Thread 0x7f9edf7fe700 (LWP 1933)):
#0 0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2 0x00007f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3 0x00007f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) at rpc-transport.c:123
#4 0x00007f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, notify_handled=_gf_true) at socket.c:2322
#5 0x00007f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, data="" poll_in=1,
poll_out=0, poll_err=0) at socket.c:2471
#6 0x00007f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583
#7 0x00007f9eeb2828ab in event_dispatch_epoll_worker (data="" at event-epoll.c:659
#8 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#9 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7f9edffff700 (LWP 1932)):
#0 0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f9ee9fd2b42 in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
#2 0x00007f9ee9fd44a8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#3 0x00007f9ee4fbadab in socket_event_poll_err (this=0x7f9ed0049cc0, gen=4, idx=27) at socket.c:1201
#4 0x00007f9ee4fbf99c in socket_event_handler (fd=36, idx=27, gen=4, data="" poll_in=1, poll_out=0, poll_err=0)
at socket.c:2480
#5 0x00007f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, event=0x7f9edfffee84) at event-epoll.c:583
#6 0x00007f9eeb2828ab in event_dispatch_epoll_worker (data="" at event-epoll.c:659
#7 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#8 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
(gdb) thread 9
[Switching to thread 9 (Thread 0x7f9edf7fe700 (LWP 1933))]
#0 0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007f9ee9fd785c in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f9ee9fda657 in __lll_lock_elision () from /lib64/libpthread.so.0
#2 0x00007f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
#3 0x00007f9eeafd2f29 in rpc_transport_pollin_destroy (pollin=0x7f9ed00452d0) at rpc-transport.c:123
#4 0x00007f9ee4fbf319 in socket_event_poll_in (this=0x7f9ed0049cc0, notify_handled=_gf_true) at socket.c:2322
#5 0x00007f9ee4fbf932 in socket_event_handler (fd=36, idx=27, gen=4, data="" poll_in=1, poll_out=0, poll_err=0) at socket.c:2471
#6 0x00007f9eeb2825d4 in event_dispatch_epoll_handler (event_pool=0x17feb00, event=0x7f9edf7fde84) at event-epoll.c:583
#7 0x00007f9eeb2828ab in event_dispatch_epoll_worker (data="" at event-epoll.c:659
#8 0x00007f9ee9fce5da in start_thread () from /lib64/libpthread.so.0
#9 0x00007f9ee98a4eaf in clone () from /lib64/libc.so.6
(gdb) frame 2
#2 0x00007f9eeb24cae6 in iobref_unref (iobref=0x7f9ed00063b0) at iobuf.c:944
944 iobuf.c: No such file or directory.
(gdb) print *iobref
$1 = {lock = {spinlock = 2, mutex = {__data = {__lock = 2, __count = 222, __owner = -2120437760, __nusers = 1, __kind = 8960, __spins = 512,
__elision = 0, __list = {__prev = 0x4000, __next = 0x7f9ed00063b000}},
__size = "\002\000\000\000\336\000\000\000\000\260\234\201\001\000\000\000\000#\000\000\000\002\000\000\000@\000\000\000\000\000\000\000\260c\000О\177",
__align = 953482739714}}, ref = -256, iobrefs = 0xffffffffffffffff, alloced = -1, used = -1}
(gdb) quit
A debugging session is active.
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel
--