----- Original Message ----- > From: "Serkan Çoban" <cobanserkan@xxxxxxxxx> > To: "Ben Turner" <bturner@xxxxxxxxxx> > Cc: "Gluster Users" <gluster-users@xxxxxxxxxxx> > Sent: Sunday, September 3, 2017 2:55:06 PM > Subject: Re: Glusterd proccess hangs on reboot > > i usually change event threads to 4. But those logs are from a default > installation. Yepo me too, I did alot of the qualification for multi threaded epoll and that is what I found to best saturate my back end(12 disk RAID 6 spinners) without wasting threads. Be careful tuning this up too high if you have alot of bricks per server, you could run into some contention with all of those threads fighting for CPU time. On the hooks stuff on my system I have: -rwxr-xr-x. 1 root root 1459 Jun 1 06:35 S29CTDB-teardown.sh -rwxr-xr-x. 1 root root 1736 Jun 1 06:35 S30samba-stop.sh Do you have SMB installed on these systems? IIRC the scripts are only run if the service is chkconfigged on, if you don't have SMB installed and chkconfiged on I don't think these are the problem. -b > > On Sun, Sep 3, 2017 at 9:52 PM, Ben Turner <bturner@xxxxxxxxxx> wrote: > > ----- Original Message ----- > >> From: "Ben Turner" <bturner@xxxxxxxxxx> > >> To: "Serkan Çoban" <cobanserkan@xxxxxxxxx> > >> Cc: "Gluster Users" <gluster-users@xxxxxxxxxxx> > >> Sent: Sunday, September 3, 2017 2:30:31 PM > >> Subject: Re: Glusterd proccess hangs on reboot > >> > >> ----- Original Message ----- > >> > From: "Milind Changire" <mchangir@xxxxxxxxxx> > >> > To: "Serkan Çoban" <cobanserkan@xxxxxxxxx> > >> > Cc: "Gluster Users" <gluster-users@xxxxxxxxxxx> > >> > Sent: Saturday, September 2, 2017 11:44:40 PM > >> > Subject: Re: Glusterd proccess hangs on reboot > >> > > >> > No worries Serkan, > >> > You can continue to use your 40 node clusters. > >> > > >> > The backtrace has resolved the function names and it should be > >> > sufficient > >> > to > >> > debug the issue. > >> > Thanks for letting us know. > >> > > >> > We'll post on this thread again to notify you about the findings. > >> > >> One of the things I find interesting is seeing: > >> > >> #1 0x00007f928450099b in hooks_worker () from > >> > >> The "hooks" scripts are usually shell scripts that get run when volumes > >> are > >> started / stopped / etc. It may be worth looking into what hooks scripts > >> are getting run at shutdown and think about how one of them could hang up > >> the system. This may be a red herring but I don't see much else going on > >> in > >> the stack trace that I looked at. The thread with the deepest stack is > >> the > >> hooks worker one, all of the other look to be in some sort of wait / sleep > >> / > >> listen state. > > > > Sorry the hooks call doesn't have the deepest stack, I didn't see the other > > thread below it. > > > > In the logs I see: > > > > [2017-08-22 10:53:39.267860] I [socket.c:2426:socket_event_handler] > > 0-transport: EPOLLERR - disconnecting now > > > > You mentioned changing event threads? Even threads controls the number of > > epoll listener threads, what did you change it to? IIRC 2 is the default > > value. This may be some sort of race condition? Just my $0.02. > > > > -b > > > >> > >> -b > >> > >> > > >> > > >> > > >> > On Sat, Sep 2, 2017 at 2:42 PM, Serkan Çoban < cobanserkan@xxxxxxxxx > > >> > wrote: > >> > > >> > > >> > Hi Milind, > >> > > >> > Anything new about the issue? Can you able to find the problem, > >> > anything else you need? > >> > I will continue with two clusters each 40 servers, so I will not be > >> > able to provide any further info for 80 servers. > >> > > >> > On Fri, Sep 1, 2017 at 10:30 AM, Serkan Çoban < cobanserkan@xxxxxxxxx > > >> > wrote: > >> > > Hi, > >> > > You can find pstack sampes here: > >> > > https://www.dropbox.com/s/6gw8b6tng8puiox/pstack_with_debuginfo.zip?dl=0 > >> > > > >> > > Here is the first one: > >> > > Thread 8 (Thread 0x7f92879ae700 (LWP 78909)): > >> > > #0 0x0000003d99c0f00d in nanosleep () from /lib64/libpthread.so.0 > >> > > #1 0x000000310fe37d57 in gf_timer_proc () from > >> > > /usr/lib64/libglusterfs.so.0 > >> > > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > >> > > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > >> > > Thread 7 (Thread 0x7f9286fad700 (LWP 78910)): > >> > > #0 0x0000003d99c0f585 in sigwait () from /lib64/libpthread.so.0 > >> > > #1 0x000000000040643b in glusterfs_sigwaiter () > >> > > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > >> > > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > >> > > Thread 6 (Thread 0x7f92865ac700 (LWP 78911)): > >> > > #0 0x0000003d998acc4d in nanosleep () from /lib64/libc.so.6 > >> > > #1 0x0000003d998acac0 in sleep () from /lib64/libc.so.6 > >> > > #2 0x000000310fe528fb in pool_sweeper () from > >> > > /usr/lib64/libglusterfs.so.0 > >> > > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > >> > > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > >> > > Thread 5 (Thread 0x7f9285bab700 (LWP 78912)): > >> > > #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > >> > > /lib64/libpthread.so.0 > >> > > #1 0x000000310fe64afc in syncenv_task () from > >> > > /usr/lib64/libglusterfs.so.0 > >> > > #2 0x000000310fe729f0 in syncenv_processor () from > >> > > /usr/lib64/libglusterfs.so.0 > >> > > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > >> > > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > >> > > Thread 4 (Thread 0x7f92851aa700 (LWP 78913)): > >> > > #0 0x0000003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > >> > > /lib64/libpthread.so.0 > >> > > #1 0x000000310fe64afc in syncenv_task () from > >> > > /usr/lib64/libglusterfs.so.0 > >> > > #2 0x000000310fe729f0 in syncenv_processor () from > >> > > /usr/lib64/libglusterfs.so.0 > >> > > #3 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > >> > > #4 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > >> > > Thread 3 (Thread 0x7f9282ecc700 (LWP 78915)): > >> > > #0 0x0000003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from > >> > > /lib64/libpthread.so.0 > >> > > #1 0x00007f928450099b in hooks_worker () from > >> > > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > > #2 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > >> > > #3 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > >> > > Thread 2 (Thread 0x7f92824cb700 (LWP 78916)): > >> > > #0 0x0000003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6 > >> > > #1 0x000000310fe2244a in dict_lookup_common () from > >> > > /usr/lib64/libglusterfs.so.0 > >> > > #2 0x000000310fe2433d in dict_set_lk () from > >> > > /usr/lib64/libglusterfs.so.0 > >> > > #3 0x000000310fe245f5 in dict_set () from /usr/lib64/libglusterfs.so.0 > >> > > #4 0x000000310fe2524c in dict_set_str () from > >> > > /usr/lib64/libglusterfs.so.0 > >> > > #5 0x00007f928453a8c4 in gd_add_brick_snap_details_to_dict () from > >> > > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > > #6 0x00007f928447b0df in glusterd_add_volume_to_dict () from > >> > > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > > #7 0x00007f928447b47c in glusterd_add_volumes_to_export_dict () from > >> > > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > > #8 0x00007f9284491edf in glusterd_rpc_friend_add () from > >> > > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > > #9 0x00007f92844528f7 in glusterd_ac_friend_add () from > >> > > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > > #10 0x00007f9284450bb9 in glusterd_friend_sm () from > >> > > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > > #11 0x00007f92844ac89a in __glusterd_mgmt_hndsk_version_ack_cbk () > >> > > from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > > #12 0x00007f92844923ee in glusterd_big_locked_cbk () from > >> > > /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > > #13 0x000000311020fad5 in rpc_clnt_handle_reply () from > >> > > /usr/lib64/libgfrpc.so.0 > >> > > #14 0x0000003110210c85 in rpc_clnt_notify () from > >> > > /usr/lib64/libgfrpc.so.0 > >> > > #15 0x000000311020bd68 in rpc_transport_notify () from > >> > > /usr/lib64/libgfrpc.so.0 > >> > > #16 0x00007f9283492ccd in socket_event_poll_in () from > >> > > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > >> > > #17 0x00007f9283493ffe in socket_event_handler () from > >> > > /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > >> > > #18 0x000000310fe87806 in event_dispatch_epoll_worker () from > >> > > /usr/lib64/libglusterfs.so.0 > >> > > #19 0x0000003d99c07aa1 in start_thread () from /lib64/libpthread.so.0 > >> > > #20 0x0000003d998e8bbd in clone () from /lib64/libc.so.6 > >> > > Thread 1 (Thread 0x7f928e4a4740 (LWP 78908)): > >> > > #0 0x0000003d99c082fd in pthread_join () from /lib64/libpthread.so.0 > >> > > #1 0x000000310fe872d5 in event_dispatch_epoll () from > >> > > /usr/lib64/libglusterfs.so.0 > >> > > #2 0x0000000000409020 in main () > >> > > > >> > > On Fri, Sep 1, 2017 at 6:17 AM, Milind Changire < mchangir@xxxxxxxxxx > >> > > > > >> > > wrote: > >> > >> Serkan, > >> > >> I have gone through other mails in the mail thread as well but > >> > >> responding > >> > >> to > >> > >> this one specifically. > >> > >> > >> > >> Is this a source install or an RPM install ? > >> > >> If this is an RPM install, could you please install the > >> > >> glusterfs-debuginfo > >> > >> RPM and retry to capture the gdb backtrace. > >> > >> > >> > >> If this is a source install, then you'll need to configure the build > >> > >> with > >> > >> --enable-debug and reinstall and retry capturing the gdb backtrace. > >> > >> > >> > >> Having the debuginfo package or a debug build helps to resolve the > >> > >> function > >> > >> names and/or line numbers. > >> > >> -- > >> > >> Milind > >> > >> > >> > >> > >> > >> > >> > >> On Thu, Aug 24, 2017 at 11:19 AM, Serkan Çoban < > >> > >> cobanserkan@xxxxxxxxx > > >> > >> wrote: > >> > >>> > >> > >>> Here you can find 10 stack trace samples from glusterd. I wait 10 > >> > >>> seconds between each trace. > >> > >>> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0 > >> > >>> > >> > >>> Content of the first stack trace is here: > >> > >>> > >> > >>> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)): > >> > >>> #0 0x0000003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0 > >> > >>> #1 0x000000303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0 > >> > >>> #2 0x0000003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0 > >> > >>> #3 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> > >>> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)): > >> > >>> #0 0x0000003aa5c0f585 in sigwait () from /lib64/libpthread.so.0 > >> > >>> #1 0x000000000040643b in glusterfs_sigwaiter () > >> > >>> #2 0x0000003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0 > >> > >>> #3 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> > >>> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)): > >> > >>> #0 0x0000003aa58acc4d in nanosleep () from /lib64/libc.so.6 > >> > >>> #1 0x0000003aa58acac0 in sleep () from /lib64/libc.so.6 > >> > >>> #2 0x000000303f8528fb in pool_sweeper () from > >> > >>> /usr/lib64/libglusterfs.so.0 > >> > >>> #3 0x0000003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0 > >> > >>> #4 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> > >>> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)): > >> > >>> #0 0x0000003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > >> > >>> /lib64/libpthread.so.0 > >> > >>> #1 0x000000303f864afc in syncenv_task () from > >> > >>> /usr/lib64/libglusterfs.so.0 > >> > >>> #2 0x000000303f8729f0 in syncenv_processor () from > >> > >>> /usr/lib64/libglusterfs.so.0 > >> > >>> #3 0x0000003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0 > >> > >>> #4 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> > >>> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)): > >> > >>> #0 0x0000003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from > >> > >>> /lib64/libpthread.so.0 > >> > >>> #1 0x000000303f864afc in syncenv_task () from > >> > >>> /usr/lib64/libglusterfs.so.0 > >> > >>> #2 0x000000303f8729f0 in syncenv_processor () from > >> > >>> /usr/lib64/libglusterfs.so.0 > >> > >>> #3 0x0000003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0 > >> > >>> #4 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> > >>> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)): > >> > >>> #0 0x0000003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from > >> > >>> /lib64/libpthread.so.0 > >> > >>> #1 0x00007f7a898a099b in ?? () from > >> > >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > >>> #2 0x0000003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0 > >> > >>> #3 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> > >>> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)): > >> > >>> #0 0x0000003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6 > >> > >>> #1 0x000000303f82244a in ?? () from /usr/lib64/libglusterfs.so.0 > >> > >>> #2 0x000000303f82433d in ?? () from /usr/lib64/libglusterfs.so.0 > >> > >>> #3 0x000000303f8245f5 in dict_set () from > >> > >>> /usr/lib64/libglusterfs.so.0 > >> > >>> #4 0x000000303f82524c in dict_set_str () from > >> > >>> /usr/lib64/libglusterfs.so.0 > >> > >>> #5 0x00007f7a898da7fd in ?? () from > >> > >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > >>> #6 0x00007f7a8981b0df in ?? () from > >> > >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > >>> #7 0x00007f7a8981b47c in ?? () from > >> > >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > >>> #8 0x00007f7a89831edf in ?? () from > >> > >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > >>> #9 0x00007f7a897f28f7 in ?? () from > >> > >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > >>> #10 0x00007f7a897f0bb9 in ?? () from > >> > >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > >>> #11 0x00007f7a8984c89a in ?? () from > >> > >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > >>> #12 0x00007f7a898323ee in ?? () from > >> > >>> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so > >> > >>> #13 0x000000303f40fad5 in rpc_clnt_handle_reply () from > >> > >>> /usr/lib64/libgfrpc.so.0 > >> > >>> #14 0x000000303f410c85 in rpc_clnt_notify () from > >> > >>> /usr/lib64/libgfrpc.so.0 > >> > >>> #15 0x000000303f40bd68 in rpc_transport_notify () from > >> > >>> /usr/lib64/libgfrpc.so.0 > >> > >>> #16 0x00007f7a88a6fccd in ?? () from > >> > >>> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > >> > >>> #17 0x00007f7a88a70ffe in ?? () from > >> > >>> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so > >> > >>> #18 0x000000303f887806 in ?? () from /usr/lib64/libglusterfs.so.0 > >> > >>> #19 0x0000003aa5c07aa1 in start_thread () from > >> > >>> /lib64/libpthread.so.0 > >> > >>> #20 0x0000003aa58e8bbd in clone () from /lib64/libc.so.6 > >> > >>> Thread 1 (Thread 0x7f7a93844740 (LWP 43068)): > >> > >>> #0 0x0000003aa5c082fd in pthread_join () from /lib64/libpthread.so.0 > >> > >>> #1 0x000000303f8872d5 in ?? () from /usr/lib64/libglusterfs.so.0 > >> > >>> #2 0x0000000000409020 in main () > >> > >>> > >> > >>> On Wed, Aug 23, 2017 at 8:46 PM, Atin Mukherjee < > >> > >>> amukherj@xxxxxxxxxx > > >> > >>> wrote: > >> > >>> > Could you be able to provide the pstack dump of the glusterd > >> > >>> > process? > >> > >>> > > >> > >>> > On Wed, 23 Aug 2017 at 20:22, Atin Mukherjee < amukherj@xxxxxxxxxx > >> > >>> > > > >> > >>> > wrote: > >> > >>> >> > >> > >>> >> Not yet. Gaurav will be taking a look at it tomorrow. > >> > >>> >> > >> > >>> >> On Wed, 23 Aug 2017 at 20:14, Serkan Çoban < > >> > >>> >> cobanserkan@xxxxxxxxx > > >> > >>> >> wrote: > >> > >>> >>> > >> > >>> >>> Hi Atin, > >> > >>> >>> > >> > >>> >>> Do you have time to check the logs? > >> > >>> >>> > >> > >>> >>> On Wed, Aug 23, 2017 at 10:02 AM, Serkan Çoban < > >> > >>> >>> cobanserkan@xxxxxxxxx > > >> > >>> >>> wrote: > >> > >>> >>> > Same thing happens with 3.12.rc0. This time perf top shows > >> > >>> >>> > hanging > >> > >>> >>> > in > >> > >>> >>> > libglusterfs.so and below is the glusterd logs, which are > >> > >>> >>> > different > >> > >>> >>> > from 3.10. > >> > >>> >>> > With 3.10.5, after 60-70 minutes CPU usage becomes normal and > >> > >>> >>> > we > >> > >>> >>> > see > >> > >>> >>> > brick processes come online and system starts to answer > >> > >>> >>> > commands > >> > >>> >>> > like > >> > >>> >>> > "gluster peer status".. > >> > >>> >>> > > >> > >>> >>> > [2017-08-23 06:46:02.150472] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.152181] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.152287] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.153503] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.153647] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.153866] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.153948] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.154018] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.154108] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.154162] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.154250] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.154322] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.154425] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.154494] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.154575] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.154649] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.154705] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.154774] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.154852] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.154903] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.154995] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.155052] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:02.155141] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:27.074052] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > [2017-08-23 06:46:27.077034] E [client_t.c:324:gf_client_ref] > >> > >>> >>> > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_request_create+0xf1) > >> > >>> >>> > [0x7f5ae2c091b1] > >> > >>> >>> > -->/usr/lib64/libgfrpc.so.0(rpcsvc_request_init+0x9c) > >> > >>> >>> > [0x7f5ae2c0851c] > >> > >>> >>> > -->/usr/lib64/libglusterfs.so.0(gf_client_ref+0x1a9) > >> > >>> >>> > [0x7f5ae2ea3949] ) 0-client_t: null client [Invalid argument] > >> > >>> >>> > > >> > >>> >>> > On Tue, Aug 22, 2017 at 7:00 PM, Serkan Çoban > >> > >>> >>> > < cobanserkan@xxxxxxxxx > > >> > >>> >>> > wrote: > >> > >>> >>> >> I reboot multiple times, also I destroyed the gluster > >> > >>> >>> >> configuration > >> > >>> >>> >> and recreate multiple times. The behavior is same. > >> > >>> >>> >> > >> > >>> >>> >> On Tue, Aug 22, 2017 at 6:47 PM, Atin Mukherjee > >> > >>> >>> >> < amukherj@xxxxxxxxxx > > >> > >>> >>> >> wrote: > >> > >>> >>> >>> My guess is there is a corruption in vol list or peer list > >> > >>> >>> >>> which > >> > >>> >>> >>> has > >> > >>> >>> >>> lead > >> > >>> >>> >>> glusterd to get into a infinite loop of traversing a > >> > >>> >>> >>> peer/volume > >> > >>> >>> >>> list > >> > >>> >>> >>> and > >> > >>> >>> >>> CPU to hog up. Again this is a guess and I've not got a > >> > >>> >>> >>> chance > >> > >>> >>> >>> to > >> > >>> >>> >>> take a > >> > >>> >>> >>> detail look at the logs and the strace output. > >> > >>> >>> >>> > >> > >>> >>> >>> I believe if you get to reboot the node again the problem > >> > >>> >>> >>> will > >> > >>> >>> >>> disappear. > >> > >>> >>> >>> > >> > >>> >>> >>> On Tue, 22 Aug 2017 at 20:07, Serkan Çoban < > >> > >>> >>> >>> cobanserkan@xxxxxxxxx > > >> > >>> >>> >>> wrote: > >> > >>> >>> >>>> > >> > >>> >>> >>>> As an addition perf top shows %80 libc-2.12.so > >> > >>> >>> >>>> __strcmp_sse42 > >> > >>> >>> >>>> during > >> > >>> >>> >>>> glusterd %100 cpu usage > >> > >>> >>> >>>> Hope this helps... > >> > >>> >>> >>>> > >> > >>> >>> >>>> On Tue, Aug 22, 2017 at 2:41 PM, Serkan Çoban > >> > >>> >>> >>>> < cobanserkan@xxxxxxxxx > > >> > >>> >>> >>>> wrote: > >> > >>> >>> >>>> > Hi there, > >> > >>> >>> >>>> > > >> > >>> >>> >>>> > I have a strange problem. > >> > >>> >>> >>>> > Gluster version in 3.10.5, I am testing new servers. > >> > >>> >>> >>>> > Gluster > >> > >>> >>> >>>> > configuration is 16+4 EC, I have three volumes, each have > >> > >>> >>> >>>> > 1600 > >> > >>> >>> >>>> > bricks. > >> > >>> >>> >>>> > I can successfully create the cluster and volumes without > >> > >>> >>> >>>> > any > >> > >>> >>> >>>> > problems. I write data to cluster from 100 clients for 12 > >> > >>> >>> >>>> > hours > >> > >>> >>> >>>> > again > >> > >>> >>> >>>> > no problem. But when I try to reboot a node, glusterd > >> > >>> >>> >>>> > process > >> > >>> >>> >>>> > hangs on > >> > >>> >>> >>>> > %100 CPU usage and seems to do nothing, no brick > >> > >>> >>> >>>> > processes > >> > >>> >>> >>>> > come > >> > >>> >>> >>>> > online. You can find strace of glusterd process for 1 > >> > >>> >>> >>>> > minutes > >> > >>> >>> >>>> > here: > >> > >>> >>> >>>> > > >> > >>> >>> >>>> > > >> > >>> >>> >>>> > https://www.dropbox.com/s/c7bxfnbqxze1yus/gluster_strace.out?dl=0 > >> > >>> >>> >>>> > > >> > >>> >>> >>>> > Here is the glusterd logs: > >> > >>> >>> >>>> > https://www.dropbox.com/s/hkstb3mdeil9a5u/glusterd.log?dl=0 > >> > >>> >>> >>>> > > >> > >>> >>> >>>> > > >> > >>> >>> >>>> > By the way, reboot of one server completes without > >> > >>> >>> >>>> > problem > >> > >>> >>> >>>> > if > >> > >>> >>> >>>> > I > >> > >>> >>> >>>> > reboot > >> > >>> >>> >>>> > the servers before creating any volumes. > >> > >>> >>> >>>> _______________________________________________ > >> > >>> >>> >>>> Gluster-users mailing list > >> > >>> >>> >>>> Gluster-users@xxxxxxxxxxx > >> > >>> >>> >>>> http://lists.gluster.org/mailman/listinfo/gluster-users > >> > >>> >>> >>> > >> > >>> >>> >>> -- > >> > >>> >>> >>> - Atin (atinm) > >> > >>> >> > >> > >>> >> -- > >> > >>> >> - Atin (atinm) > >> > >>> > > >> > >>> > -- > >> > >>> > - Atin (atinm) > >> > >>> _______________________________________________ > >> > >>> Gluster-users mailing list > >> > >>> Gluster-users@xxxxxxxxxxx > >> > >>> http://lists.gluster.org/mailman/listinfo/gluster-users > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> -- > >> > >> Milind > >> > >> > >> > > >> > > >> > > >> > -- > >> > Milind > >> > > >> > > >> > _______________________________________________ > >> > Gluster-users mailing list > >> > Gluster-users@xxxxxxxxxxx > >> > http://lists.gluster.org/mailman/listinfo/gluster-users > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users@xxxxxxxxxxx > >> http://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users