On 01/28/2016 07:48 PM, David Robinson
wrote:
> Something really bad
related to locks is happening. Did you guys patch the recent
memory corruption bug which only affects workloads with more
than 128 clients? >
http://review.gluster.org/13241
We have not applied that patch. Will this be included in the
3.6.7 release? If so, do you know when that version will be
released?
+ Raghavendra Bhat
Could you please let David know about next release date?
David
------ Original Message ------
Sent: 1/28/2016 5:10:07 AM
Subject: Re: [Gluster-devel] heal hanging
On 01/25/2016 11:10 PM, David
Robinson wrote:
It is doing it again... statedump from gfs02a is
attached...
David,
I see a lot of traffic from [f]inodelks:
15:09:00 :) ⚡ grep wind_from
data-brick02a-homegfs.4066.dump.1453742225 | sort | uniq -c
11 unwind_from=default_finodelk_cbk
11 unwind_from=io_stats_finodelk_cbk
11 unwind_from=pl_common_inodelk
1133 wind_from=default_finodelk_resume
1 wind_from=default_inodelk_resume
75 wind_from=index_getxattr
6 wind_from=io_stats_entrylk
12776 wind_from=io_stats_finodelk
15 wind_from=io_stats_flush
75 wind_from=io_stats_getxattr
4 wind_from=io_stats_inodelk
4 wind_from=io_stats_lk
4 wind_from=io_stats_setattr
75 wind_from=marker_getxattr
4 wind_from=marker_setattr
75 wind_from=quota_getxattr
6 wind_from=server_entrylk_resume
12776 wind_from=server_finodelk_resume
<<--------------
15 wind_from=server_flush_resume
75 wind_from=server_getxattr_resume
4 wind_from=server_inodelk_resume
4 wind_from=server_lk_resume
4 wind_from=server_setattr_resume
But very less number of active locks:
pk1@localhost - ~/Downloads
15:09:07 :) ⚡ grep ACTIVE
data-brick02a-homegfs.4066.dump.1453742225
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=9223372036854775806, len=0, pid = 11678,
owner=b42fff03ce7f0000, client=0x13d2cd0,
connection-id=corvidpost3.corvidtec.com-52656-2016/01/22-16:40:31:459920-homegfs-client-6-0-1,
granted at 2016-01-25 17:16:06
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
len=0, pid = 15759, owner=b8ca8c0100000000, client=0x189e470,
connection-id=corvidpost4.corvidtec.com-17718-2016/01/22-16:40:31:221380-homegfs-client-6-0-1,
granted at 2016-01-25 17:12:52
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=9223372036854775806, len=0, pid = 7103,
owner=0cf31a98f87f0000, client=0x2201d60,
connection-id=zlv-bangell-4812-2016/01/25-13:45:52:170157-homegfs-client-6-0-0,
granted at 2016-01-25 17:09:56
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=9223372036854775806, len=0, pid = 55764,
owner=882dbea1417f0000, client=0x17fc940,
connection-id=corvidpost.corvidtec.com-35961-2016/01/22-16:40:31:88946-homegfs-client-6-0-1,
granted at 2016-01-25 17:06:12
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=9223372036854775806, len=0, pid = 21129,
owner=3cc068a1e07f0000, client=0x1495040,
connection-id=corvidpost2.corvidtec.com-43400-2016/01/22-16:40:31:248771-homegfs-client-6-0-1,
granted at 2016-01-25 17:15:53
One more odd thing I found is the following:
[2016-01-15 14:03:06.910687] C
[rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired]
0-homegfs-client-2: server 10.200.70.1:49153 has not responded
in the last 10 seconds, disconnecting.
[2016-01-15 14:03:06.910886] E
[rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x2b74c289a580]
(-->
/usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x2b74c2b27787]
(-->
/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x2b74c2b2789e]
(-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x2b74c2b27951]
(-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x2b74c2b27f1f]
))))) 0-homegfs-client-2: forced unwinding frame
type(GlusterFS 3.3) op(FINODELK(30)) called at 2016-01-15
10:30:09.487422 (xid=0x11ed3f)
FINODELK is called at 2016-01-15 10:30:09.487422 but the
response still didn't come till 14:03:06. That is almost 3.5
hours!!
Something really bad related to locks is happening. Did you
guys patch the recent memory corruption bug which only affects
workloads with more than 128 clients? http://review.gluster.org/13241
Pranith
------ Original Message ------
Sent: 1/24/2016 9:27:02 PM
Subject: Re: [Gluster-devel] heal
hanging
It seems
like there is a lot of finodelk/inodelk traffic. I
wonder why that is. I think the next steps is to collect
statedump of the brick which is taking lot of CPU, using
"gluster volume statedump <volname>"
Pranith
On 01/22/2016 08:36 AM,
Glomski, Patrick wrote:
Pranith, attached are stack traces collected
every second for 20 seconds from the high-%cpu
glusterfsd process.
Patrick
|