> Something really bad related to locks is happening. Did you guys patch the recent memory corruption bug which only affects workloads with more than 128 clients? > http://review.gluster.org/13241
We have not applied that patch. Will this be included in the 3.6.7 release? If so, do you know when that version will be released?
David
------ Original Message ------
Sent: 1/28/2016 5:10:07 AM
Subject: Re: [Gluster-devel] heal hanging
On 01/25/2016 11:10 PM, David Robinson wrote:
It is doing it again... statedump from gfs02a is attached... David, I see a lot of traffic from [f]inodelks: 15:09:00 :) ⚡ grep wind_from data-brick02a-homegfs.4066.dump.1453742225 | sort | uniq -c 11 unwind_from=default_finodelk_cbk 11 unwind_from=io_stats_finodelk_cbk 11 unwind_from=pl_common_inodelk 1133 wind_from=default_finodelk_resume 1 wind_from=default_inodelk_resume 75 wind_from=index_getxattr 6 wind_from=io_stats_entrylk 12776 wind_from=io_stats_finodelk 15 wind_from=io_stats_flush 75 wind_from=io_stats_getxattr 4 wind_from=io_stats_inodelk 4 wind_from=io_stats_lk 4 wind_from=io_stats_setattr 75 wind_from=marker_getxattr 4 wind_from=marker_setattr 75 wind_from=quota_getxattr 6 wind_from=server_entrylk_resume 12776 wind_from=server_finodelk_resume <<-------------- 15 wind_from=server_flush_resume 75 wind_from=server_getxattr_resume 4 wind_from=server_inodelk_resume 4 wind_from=server_lk_resume 4 wind_from=server_setattr_resume
But very less number of active locks: pk1@localhost - ~/Downloads 15:09:07 :) ⚡ grep ACTIVE data-brick02a-homegfs.4066.dump.1453742225 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=9223372036854775806, len=0, pid = 11678, owner=b42fff03ce7f0000, client=0x13d2cd0, connection-id=corvidpost3.corvidtec.com-52656-2016/01/22-16:40:31:459920-homegfs-client-6-0-1, granted at 2016-01-25 17:16:06 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 15759, owner=b8ca8c0100000000, client=0x189e470, connection-id=corvidpost4.corvidtec.com-17718-2016/01/22-16:40:31:221380-homegfs-client-6-0-1, granted at 2016-01-25 17:12:52 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=9223372036854775806, len=0, pid = 7103, owner=0cf31a98f87f0000, client=0x2201d60, connection-id=zlv-bangell-4812-2016/01/25-13:45:52:170157-homegfs-client-6-0-0, granted at 2016-01-25 17:09:56 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=9223372036854775806, len=0, pid = 55764, owner=882dbea1417f0000, client=0x17fc940, connection-id=corvidpost.corvidtec.com-35961-2016/01/22-16:40:31:88946-homegfs-client-6-0-1, granted at 2016-01-25 17:06:12 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=9223372036854775806, len=0, pid = 21129, owner=3cc068a1e07f0000, client=0x1495040, connection-id=corvidpost2.corvidtec.com-43400-2016/01/22-16:40:31:248771-homegfs-client-6-0-1, granted at 2016-01-25 17:15:53
One more odd thing I found is the following:
[2016-01-15 14:03:06.910687] C [rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-homegfs-client-2: server 10.200.70.1:49153 has not responded in the last 10 seconds, disconnecting. [2016-01-15 14:03:06.910886] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x2b74c289a580] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x2b74c2b27787] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x2b74c2b2789e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x2b74c2b27951] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x2b74c2b27f1f] ))))) 0-homegfs-client-2: forced unwinding frame type(GlusterFS 3.3) op(FINODELK(30)) called at 2016-01-15 10:30:09.487422 (xid=0x11ed3f)
FINODELK is called at 2016-01-15 10:30:09.487422 but the response still didn't come till 14:03:06. That is almost 3.5 hours!!
Something really bad related to locks is happening. Did you guys patch the recent memory corruption bug which only affects workloads with more than 128 clients? http://review.gluster.org/13241
Pranith
------ Original Message ------
Sent: 1/24/2016 9:27:02 PM
Subject: Re: [Gluster-devel] heal hanging
It seems like there is a lot of finodelk/inodelk traffic. I wonder why that is. I think the next steps is to collect statedump of the brick which is taking lot of CPU, using "gluster volume statedump <volname>"
Pranith
On 01/22/2016 08:36 AM, Glomski, Patrick wrote:
Pranith, attached are stack traces collected every second for 20 seconds from the high-%cpu glusterfsd process.
Patrick
|