On 03/08/2015 09:36 AM, Przemysław Mroczek wrote:
I don't have volfiles, they are not on our machines as I said previously we don't have impact on gluster servers. I saw some graph that looks similiar to volume file on logs. I will paste it here but we don't really have any impact on that. We are just using client to connect to gluster servers, we are not in control of.
I would recommend to not alter the default for frame timeout.
Btw, do you think that different versions of gluster client and gluster server could be an issue here?
It can potentially be. What versions are you using on the servers and the client?
-Vijay
2015-03-08 1:29 GMT+01:00 Vijay Bellur <vbellur@xxxxxxxxxx <mailto:vbellur@xxxxxxxxxx>>: On 03/07/2015 06:20 PM, Przemysław Mroczek wrote: Hi guys, We have rails app, which is using gluster for our distributed file system. The glusters servers are hosted independently as part of deal with other, we don't have any impact on them, we are connected o them by using gluster native client. We tried to resolve this issue using help from the admins of the company that is hosting our gluster servers, but they say that's the client issue and we ran out of ideas how that's possible if we are not doing anything special here. Information about independent gluster servers: -version: 3.6.0.42.1 - They are using red hat -They are enterprise so the are always using older versions Our servers: System version: Ubuntu 14.04 Our gluster client version: 3.6.2 The exact problem is that it often happens(couple times a week) that errors in gluster causes proceses to become zombies. It happens with our application server(unicorn), nginx and our crawling script that is run as daemon. Our fstab file: 10.10.11.17:/drslk-prod /mnt/storage glusterfs defaults,_netdev,nobootwait,__fetch-attempts=10 0 0 10.10.11.17:/drslk-backup /mnt/backup glusterfs defaults,_netdev,nobootwait,__fetch-attempts=10 0 0 Logs from gluster: 2015-02-18 12:36:12.375695] E [rpc-clnt.c:362:saved_frames___unwind] (--> /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___callingfn+0x186)[__0x7fb41ddeada6] (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___unwind+0x1de)[0x7fb41d bc1c7e] (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___destroy+0xe)[0x7fb41dbc1d8e] (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___connection_cleanup+0x82)[__0x7fb41dbc3602] (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc _clnt_notify+0x48)[__0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18 12:36:12.361489 (xid=0x5d475da) [2015-02-18 12:36:12.375765] W [client-rpc-fops.c:2766:__client3_3_lookup_cbk] 0-drslk-prod-client-10: remote operation failed: Transport endpoint is not connected. Path: /system/posts/00/00/71/77/59.__jpg (2ad81c2b-a141-478d-9dd4-__253345edbce b) [2015-02-18 12:36:12.376288] E [rpc-clnt.c:362:saved_frames___unwind] (--> /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___callingfn+0x186)[__0x7fb41ddeada6] (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___unwind+0x1de)[0x7fb41d bc1c7e] (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___destroy+0xe)[0x7fb41dbc1d8e] (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___connection_cleanup+0x82)[__0x7fb41dbc3602] (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc _clnt_notify+0x48)[__0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18 12:36:12.361858 (xid=0x5d475db) [2015-02-18 12:36:12.376355] W [client-rpc-fops.c:2766:__client3_3_lookup_cbk] 0-drslk-prod-client-10: remote operation failed: Transport endpoint is not connected. Path: /system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-__33b893af103d) [2015-02-18 12:36:12.376711] I [socket.c:3292:socket_submit___request] 0-drslk-prod-client-10: not connected (priv->connected = 0) [2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt___submit] 0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dc Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (drslk-prod-client-10) [2015-02-18 12:36:12.376814] W [client-rpc-fops.c:2766:__client3_3_lookup_cbk] 0-drslk-prod-client-10: remote operation failed: Transport endpoint is not connected. Path: (null) (00000000-0000-0000-0000-__000000000000) [2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc___notify] 0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt___submit] 0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dd Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (drslk-prod-client-10) [2015-02-18 12:36:12.376906] W [client-rpc-fops.c:2766:__client3_3_lookup_cbk] 0-drslk-prod-client-10: remote operation failed: Transport endpoint is not connected. Path: (null) (00000000-0000-0000-0000-__000000000000) [2015-02-18 12:36:12.376931] E [socket.c:2267:socket_connect___finish] 0-drslk-prod-client-10: connection to 10.10.11.23:24007 <http://10.10.11.23:24007> <http://10.10.11.23:24007/> failed (Connection refused) [2015-02-18 12:36:12.379296] W [client-rpc-fops.c:2766:__client3_3_lookup_cbk] 0-drslk-prod-client-10: remote operation failed: Transport endpoint is not connected. Path: (null) (00000000-0000-0000-0000-__000000000000) [2015-02-18 12:36:12.379700] W [client-rpc-fops.c:2766:__client3_3_lookup_cbk] 0-drslk-prod-client-10: remote operation failed: Transport endpoint is not connected. Path: (null) (00000000-0000-0000-0000-__000000000000) [2015-02-18 13:10:52.759736] E [client-handshake.c:1496:__client_query_portmap_cbk] 0-drslk-prod-client-10: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2015-02-18 13:10:52.759796] I [client.c:2215:client_rpc___notify] 0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client process will keep trying to connect to glusterd until brick's port is available [2015-02-18 13:11:02.897307] I [rpc-clnt.c:1761:rpc_clnt___reconfig] 0-drslk-prod-client-10: changing port to 49349 (from 0) [2015-02-18 13:11:02.898097] I [client-handshake.c:1413:__select_server_supported___programs] 0-drslk-prod-client-10: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-02-18 13:11:02.898446] I [client-handshake.c:1200:__client_setvolume_cbk] 0-drslk-prod-client-10: Connected to drslk-prod-client-10, attached to remote volume '/GLUSTERFS/drslk-prod'. [2015-02-18 13:11:02.898460] I [client-handshake.c:1210:__client_setvolume_cbk] 0-drslk-prod-client-10: Server and Client lk-version numbers are not same, reopening the fds Can you provide the gluster volume configuration details? It does look like frame-timeout for the volume has been set to 60. Is there any specific reason? Normally altering the frame-timeout is not recommended. -Vijay
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users