Re: Gluster errors create zombie processes [LOGS ATTACHED]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I don't have volfiles, they are not on our machines as I said previously we don't have impact on gluster servers.

I saw some graph that looks similiar to volume file on logs. I will paste it here but we don't really have any impact on that. We are just using client to connect to gluster servers, we are not in control of.

1: volume drslk-prod-client-0
  2:     type protocol/client
  3:     option ping-timeout 20
  4:     option remote-host brick13.gluster.iadm
  5:     option remote-subvolume /GLUSTERFS/drslk-prod
  6:     option transport-type socket
  7:     option frame-timeout 60
  8:     option send-gids true
  9: end-volume
 10:  
 11: volume drslk-prod-client-1
 12:     type protocol/client
 13:     option ping-timeout 20
 14:     option remote-host brick14.gluster.iadm
 15:     option remote-subvolume /GLUSTERFS/drslk-prod
 16:     option transport-type socket
 17:     option frame-timeout 60
 18:     option send-gids true
 19: end-volume
 20:  
 21: volume drslk-prod-client-2
 22:     type protocol/client
 23:     option ping-timeout 20
 24:     option remote-host brick15.gluster.iadm
 25:     option remote-subvolume /GLUSTERFS/drslk-prod
 26:     option transport-type socket
 27:     option frame-timeout 60
 28:     option send-gids true
 29: end-volume
 30:  
 31: volume drslk-prod-replicate-0
 32:     type cluster/replicate
 33:     option read-hash-mode 2
 34:     option data-self-heal-window-size 128
 35:     option quorum-type auto
 36:     subvolumes drslk-prod-client-0 drslk-prod-client-1 drslk-prod-client-2
 37: end-volume
 38:  
 39: volume drslk-prod-client-3
 40:     type protocol/client
 41:     option ping-timeout 20
 42:     option remote-host brick16.gluster.iadm
 43:     option remote-subvolume /GLUSTERFS/drslk-prod
 44:     option transport-type socket
 45:     option frame-timeout 60
 46:     option send-gids true
 47: end-volume
 48:  
 49: volume drslk-prod-client-4
 50:     type protocol/client
 51:     option ping-timeout 20
 52:     option remote-host brick17.gluster.iadm
 53:     option remote-subvolume /GLUSTERFS/drslk-prod
 54:     option transport-type socket
 55:     option frame-timeout 60
 56:     option send-gids true
 57: end-volume
 58:  
 59: volume drslk-prod-client-5
 60:     type protocol/client
 61:     option ping-timeout 20
 62:     option remote-host brick18.gluster.iadm
 63:     option remote-subvolume /GLUSTERFS/drslk-prod
 64:     option transport-type socket
 65:     option frame-timeout 60
 66:     option send-gids true
 67: end-volume
 68:  
 69: volume drslk-prod-replicate-1
 70:     type cluster/replicate
 71:     option read-hash-mode 2
 72:     option data-self-heal-window-size 128
 73:     option quorum-type auto
 74:     subvolumes drslk-prod-client-3 drslk-prod-client-4 drslk-prod-client-5
 75: end-volume
 76:  
 77: volume drslk-prod-client-6
 78:     type protocol/client
 79:     option ping-timeout 20
 80:     option remote-host brick19.gluster.iadm
 81:     option remote-subvolume /GLUSTERFS/drslk-prod
 82:     option transport-type socket
 83:     option frame-timeout 60
 84:     option send-gids true
 85: end-volume
 86:  
 87: volume drslk-prod-client-7
 88:     type protocol/client
 89:     option ping-timeout 20
 90:     option remote-host brick20.gluster.iadm
 91:     option remote-subvolume /GLUSTERFS/drslk-prod
 92:     option transport-type socket
 93:     option frame-timeout 60
 94:     option send-gids true
 95: end-volume
 96:  
 97: volume drslk-prod-client-8
 98:     type protocol/client
 99:     option ping-timeout 20
100:     option remote-host brick21.gluster.iadm
101:     option remote-subvolume /GLUSTERFS/drslk-prod
102:     option transport-type socket
103:     option frame-timeout 60
104:     option send-gids true
105: end-volume
106:  
107: volume drslk-prod-replicate-2
108:     type cluster/replicate
109:     option read-hash-mode 2
110:     option data-self-heal-window-size 128
111:     option quorum-type auto
112:     subvolumes drslk-prod-client-6 drslk-prod-client-7 drslk-prod-client-8
113: end-volume
114:  
115: volume drslk-prod-client-9
116:     type protocol/client
117:     option ping-timeout 20
118:     option remote-host brick22.gluster.iadm
119:     option remote-subvolume /GLUSTERFS/drslk-prod
120:     option transport-type socket
121:     option frame-timeout 60
122:     option send-gids true
123: end-volume
124:  
125: volume drslk-prod-client-10
126:     type protocol/client
127:     option ping-timeout 20
128:     option remote-host brick23.gluster.iadm
129:     option remote-subvolume /GLUSTERFS/drslk-prod
130:     option transport-type socket
131:     option frame-timeout 60
132:     option send-gids true
133: end-volume
134:  
135: volume drslk-prod-client-11
136:     type protocol/client
137:     option ping-timeout 20
138:     option remote-host brick24.gluster.iadm
139:     option remote-subvolume /GLUSTERFS/drslk-prod
140:     option transport-type socket
141:     option frame-timeout 60
142:     option send-gids true
143: end-volume
144:  
145: volume drslk-prod-replicate-3
146:     type cluster/replicate
147:     option read-hash-mode 2
148:     option data-self-heal-window-size 128
149:     option quorum-type auto
150:     subvolumes drslk-prod-client-9 drslk-prod-client-10 drslk-prod-client-11
151: end-volume
152:  
153: volume drslk-prod-dht
154:     type cluster/distribute
155:     option min-free-disk 10%
156:     option readdir-optimize on
157:     subvolumes drslk-prod-replicate-0 drslk-prod-replicate-1 drslk-prod-replicate-2 drslk-prod-replicate-3
158: end-volume
159:  
160: volume drslk-prod-write-behind
161:     type performance/write-behind
162:     option cache-size 1MB
163:     subvolumes drslk-prod-dht
164: end-volume
165:  
166: volume drslk-prod-read-ahead
167:     type performance/read-ahead
168:     subvolumes drslk-prod-write-behind
169: end-volume
170:  
171: volume drslk-prod-readdir-ahead
172:     type performance/readdir-ahead
173:     subvolumes drslk-prod-read-ahead
174: end-volume
175:  
176: volume drslk-prod-io-cache
177:     type performance/io-cache
178:     option cache-timeout 60
179:     option cache-size 512MB
180:     subvolumes drslk-prod-readdir-ahead
181: end-volume
182:  
183: volume drslk-prod-quick-read
184:     type performance/quick-read
185:     option cache-size 512MB
186:     subvolumes drslk-prod-io-cache
187: end-volume
188:  
189: volume drslk-prod-md-cache
190:     type performance/md-cache
191:     subvolumes drslk-prod-quick-read
192: end-volume
193:  
194: volume drslk-prod
195:     type debug/io-stats
196:     option latency-measurement off
197:     option count-fop-hits off
198:     subvolumes drslk-prod-md-cache
199: end-volume
200:  
201: volume meta-autoload
202:     type meta
203:     subvolumes drslk-prod
204: end-volume
205:  

Btw, do you think that different versions of gluster client and gluster server could be an issue here?

2015-03-08 1:29 GMT+01:00 Vijay Bellur <vbellur@xxxxxxxxxx>:
On 03/07/2015 06:20 PM, Przemysław Mroczek wrote:
Hi guys,

We have rails app, which is using gluster for our distributed file
system. The glusters servers are hosted independently as part of deal
with other, we don't have any impact on them, we are connected o them by
using gluster native client.

We tried to resolve this issue using help from the admins of the company
that is hosting our gluster servers, but they say that's the client
issue and we ran out of ideas how that's possible if we are not doing
anything special here.

Information about independent gluster servers:
-version: 3.6.0.42.1
- They are using red hat
-They are enterprise so the are always using older versions

Our servers:
System version: Ubuntu 14.04
Our gluster client version: 3.6.2

The exact problem is that it often happens(couple times a week) that
errors in gluster causes proceses to become zombies. It happens with our
application server(unicorn), nginx and our crawling script that is run
as daemon.

Our fstab file:

10.10.11.17:/drslk-prod     /mnt/storage          glusterfs
defaults,_netdev,nobootwait,fetch-attempts=10 0 0
10.10.11.17:/drslk-backup     /mnt/backup          glusterfs
defaults,_netdev,nobootwait,fetch-attempts=10 0 0

Logs from gluster:

2015-02-18 12:36:12.375695] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fb41ddeada6]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb41d
bc1c7e] (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb41dbc1d8e]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7fb41dbc3602]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc
_clnt_notify+0x48)[0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced
unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18
12:36:12.361489 (xid=0x5d475da)
[2015-02-18 12:36:12.375765] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
/system/posts/00/00/71/77/59.jpg (2ad81c2b-a141-478d-9dd4-253345edbce
b)
[2015-02-18 12:36:12.376288] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fb41ddeada6]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb41d
bc1c7e] (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb41dbc1d8e]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7fb41dbc3602]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc
_clnt_notify+0x48)[0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced
unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18
12:36:12.361858 (xid=0x5d475db)
[2015-02-18 12:36:12.376355] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
/system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-33b893af103d)
[2015-02-18 12:36:12.376711] I [socket.c:3292:socket_submit_request]
0-drslk-prod-client-10: not connected (priv->connected = 0)
[2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt_submit]
0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dc
Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
(drslk-prod-client-10)
[2015-02-18 12:36:12.376814] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
(null) (00000000-0000-0000-0000-000000000000)
[2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc_notify]
0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client
process will keep trying to connect to glusterd until brick's port is
available
[2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt_submit]
0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dd
Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
(drslk-prod-client-10)
[2015-02-18 12:36:12.376906] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
(null) (00000000-0000-0000-0000-000000000000)
[2015-02-18 12:36:12.376931] E [socket.c:2267:socket_connect_finish]
0-drslk-prod-client-10: connection to 10.10.11.23:24007
<http://10.10.11.23:24007/> failed (Connection refused)

[2015-02-18 12:36:12.379296] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
(null) (00000000-0000-0000-0000-000000000000)
[2015-02-18 12:36:12.379700] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
(null) (00000000-0000-0000-0000-000000000000)
[2015-02-18 13:10:52.759736] E
[client-handshake.c:1496:client_query_portmap_cbk]
0-drslk-prod-client-10: failed to get the port number for remote
subvolume. Please run 'gluster volume status' on server to see if brick
process is running.
[2015-02-18 13:10:52.759796] I [client.c:2215:client_rpc_notify]
0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client
process will keep trying to connect to glusterd until brick's port is
available
[2015-02-18 13:11:02.897307] I [rpc-clnt.c:1761:rpc_clnt_reconfig]
0-drslk-prod-client-10: changing port to 49349 (from 0)
[2015-02-18 13:11:02.898097] I
[client-handshake.c:1413:select_server_supported_programs]
0-drslk-prod-client-10: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2015-02-18 13:11:02.898446] I
[client-handshake.c:1200:client_setvolume_cbk] 0-drslk-prod-client-10:
Connected to drslk-prod-client-10, attached to remote volume
'/GLUSTERFS/drslk-prod'.
[2015-02-18 13:11:02.898460] I
[client-handshake.c:1210:client_setvolume_cbk] 0-drslk-prod-client-10:
Server and Client lk-version numbers are not same, reopening the fds


Can you provide the gluster volume configuration details?

It does look like frame-timeout for the volume has been set to 60. Is there any specific reason? Normally altering the frame-timeout is not recommended.

-Vijay


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux