Re: Gluster errors create zombie processes [LOGS ATTACHED]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/07/2015 06:20 PM, Przemysław Mroczek wrote:
Hi guys,

We have rails app, which is using gluster for our distributed file
system. The glusters servers are hosted independently as part of deal
with other, we don't have any impact on them, we are connected o them by
using gluster native client.

We tried to resolve this issue using help from the admins of the company
that is hosting our gluster servers, but they say that's the client
issue and we ran out of ideas how that's possible if we are not doing
anything special here.

Information about independent gluster servers:
-version: 3.6.0.42.1
- They are using red hat
-They are enterprise so the are always using older versions

Our servers:
System version: Ubuntu 14.04
Our gluster client version: 3.6.2

The exact problem is that it often happens(couple times a week) that
errors in gluster causes proceses to become zombies. It happens with our
application server(unicorn), nginx and our crawling script that is run
as daemon.

Our fstab file:

10.10.11.17:/drslk-prod     /mnt/storage          glusterfs
defaults,_netdev,nobootwait,fetch-attempts=10 0 0
10.10.11.17:/drslk-backup     /mnt/backup          glusterfs
defaults,_netdev,nobootwait,fetch-attempts=10 0 0

Logs from gluster:

2015-02-18 12:36:12.375695] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fb41ddeada6]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb41d
bc1c7e] (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb41dbc1d8e]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7fb41dbc3602]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc
_clnt_notify+0x48)[0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced
unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18
12:36:12.361489 (xid=0x5d475da)
[2015-02-18 12:36:12.375765] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
/system/posts/00/00/71/77/59.jpg (2ad81c2b-a141-478d-9dd4-253345edbce
b)
[2015-02-18 12:36:12.376288] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fb41ddeada6]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7fb41d
bc1c7e] (-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb41dbc1d8e]
(-->
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7fb41dbc3602]
(--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc
_clnt_notify+0x48)[0x7fb41dbc3d98] ))))) 0-drslk-prod-client-10: forced
unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-02-18
12:36:12.361858 (xid=0x5d475db)
[2015-02-18 12:36:12.376355] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
/system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-33b893af103d)
[2015-02-18 12:36:12.376711] I [socket.c:3292:socket_submit_request]
0-drslk-prod-client-10: not connected (priv->connected = 0)
[2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt_submit]
0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dc
Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
(drslk-prod-client-10)
[2015-02-18 12:36:12.376814] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
(null) (00000000-0000-0000-0000-000000000000)
[2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc_notify]
0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client
process will keep trying to connect to glusterd until brick's port is
available
[2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt_submit]
0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dd
Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
(drslk-prod-client-10)
[2015-02-18 12:36:12.376906] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
(null) (00000000-0000-0000-0000-000000000000)
[2015-02-18 12:36:12.376931] E [socket.c:2267:socket_connect_finish]
0-drslk-prod-client-10: connection to 10.10.11.23:24007
<http://10.10.11.23:24007/> failed (Connection refused)
[2015-02-18 12:36:12.379296] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
(null) (00000000-0000-0000-0000-000000000000)
[2015-02-18 12:36:12.379700] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-drslk-prod-client-10:
remote operation failed: Transport endpoint is not connected. Path:
(null) (00000000-0000-0000-0000-000000000000)
[2015-02-18 13:10:52.759736] E
[client-handshake.c:1496:client_query_portmap_cbk]
0-drslk-prod-client-10: failed to get the port number for remote
subvolume. Please run 'gluster volume status' on server to see if brick
process is running.
[2015-02-18 13:10:52.759796] I [client.c:2215:client_rpc_notify]
0-drslk-prod-client-10: disconnected from drslk-prod-client-10. Client
process will keep trying to connect to glusterd until brick's port is
available
[2015-02-18 13:11:02.897307] I [rpc-clnt.c:1761:rpc_clnt_reconfig]
0-drslk-prod-client-10: changing port to 49349 (from 0)
[2015-02-18 13:11:02.898097] I
[client-handshake.c:1413:select_server_supported_programs]
0-drslk-prod-client-10: Using Program GlusterFS 3.3, Num (1298437),
Version (330)
[2015-02-18 13:11:02.898446] I
[client-handshake.c:1200:client_setvolume_cbk] 0-drslk-prod-client-10:
Connected to drslk-prod-client-10, attached to remote volume
'/GLUSTERFS/drslk-prod'.
[2015-02-18 13:11:02.898460] I
[client-handshake.c:1210:client_setvolume_cbk] 0-drslk-prod-client-10:
Server and Client lk-version numbers are not same, reopening the fds


Can you provide the gluster volume configuration details?

It does look like frame-timeout for the volume has been set to 60. Is there any specific reason? Normally altering the frame-timeout is not recommended.

-Vijay

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users





[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux