Re: Gluster errors create zombie processes [LOGS ATTACHED]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/08/2015 09:36 AM, Przemysław Mroczek wrote:
I don't have volfiles, they are not on our machines as I said previously
we don't have impact on gluster servers.

I saw some graph that looks similiar to volume file on logs. I will
paste it here but we don't really have any impact on that. We are just
using client to connect to gluster servers, we are not in control of.


I would recommend to not alter the default for frame timeout.


Btw, do you think that different versions of gluster client and gluster
server could be an issue here?


It can potentially be. What versions are you using on the servers and the client?

-Vijay

2015-03-08 1:29 GMT+01:00 Vijay Bellur <vbellur@xxxxxxxxxx
<mailto:vbellur@xxxxxxxxxx>>:

    On 03/07/2015 06:20 PM, Przemysław Mroczek wrote:

        Hi guys,

        We have rails app, which is using gluster for our distributed file
        system. The glusters servers are hosted independently as part of
        deal
        with other, we don't have any impact on them, we are connected o
        them by
        using gluster native client.

        We tried to resolve this issue using help from the admins of the
        company
        that is hosting our gluster servers, but they say that's the client
        issue and we ran out of ideas how that's possible if we are not
        doing
        anything special here.

        Information about independent gluster servers:
        -version: 3.6.0.42.1
        - They are using red hat
        -They are enterprise so the are always using older versions

        Our servers:
        System version: Ubuntu 14.04
        Our gluster client version: 3.6.2

        The exact problem is that it often happens(couple times a week) that
        errors in gluster causes proceses to become zombies. It happens
        with our
        application server(unicorn), nginx and our crawling script that
        is run
        as daemon.

        Our fstab file:

        10.10.11.17:/drslk-prod     /mnt/storage          glusterfs
        defaults,_netdev,nobootwait,__fetch-attempts=10 0 0
        10.10.11.17:/drslk-backup     /mnt/backup          glusterfs
        defaults,_netdev,nobootwait,__fetch-attempts=10 0 0

        Logs from gluster:

        2015-02-18 12:36:12.375695] E
        [rpc-clnt.c:362:saved_frames___unwind] (-->
        /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___callingfn+0x186)[__0x7fb41ddeada6]
        (-->
        /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___unwind+0x1de)[0x7fb41d
        bc1c7e] (-->
        /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___destroy+0xe)[0x7fb41dbc1d8e]
        (-->
        /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___connection_cleanup+0x82)[__0x7fb41dbc3602]
        (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc
        _clnt_notify+0x48)[__0x7fb41dbc3d98] )))))
        0-drslk-prod-client-10: forced
        unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
        2015-02-18
        12:36:12.361489 (xid=0x5d475da)
        [2015-02-18 12:36:12.375765] W
        [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
        0-drslk-prod-client-10:
        remote operation failed: Transport endpoint is not connected. Path:
        /system/posts/00/00/71/77/59.__jpg
        (2ad81c2b-a141-478d-9dd4-__253345edbce
        b)
        [2015-02-18 12:36:12.376288] E
        [rpc-clnt.c:362:saved_frames___unwind] (-->
        /usr/lib/x86_64-linux-gnu/__libglusterfs.so.0(_gf_log___callingfn+0x186)[__0x7fb41ddeada6]
        (-->
        /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___unwind+0x1de)[0x7fb41d
        bc1c7e] (-->
        /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(saved_frames___destroy+0xe)[0x7fb41dbc1d8e]
        (-->
        /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc_clnt___connection_cleanup+0x82)[__0x7fb41dbc3602]
        (--> /usr/lib/x86_64-linux-gnu/__libgfrpc.so.0(rpc
        _clnt_notify+0x48)[__0x7fb41dbc3d98] )))))
        0-drslk-prod-client-10: forced
        unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
        2015-02-18
        12:36:12.361858 (xid=0x5d475db)
        [2015-02-18 12:36:12.376355] W
        [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
        0-drslk-prod-client-10:
        remote operation failed: Transport endpoint is not connected. Path:
        /system/posts/00/00/08 (f5c33a99-719e-4ea2-ad1f-__33b893af103d)
        [2015-02-18 12:36:12.376711] I
        [socket.c:3292:socket_submit___request]
        0-drslk-prod-client-10: not connected (priv->connected = 0)
        [2015-02-18 12:36:12.376749] W [rpc-clnt.c:1562:rpc_clnt___submit]
        0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dc
        Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
        (drslk-prod-client-10)
        [2015-02-18 12:36:12.376814] W
        [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
        0-drslk-prod-client-10:
        remote operation failed: Transport endpoint is not connected. Path:
        (null) (00000000-0000-0000-0000-__000000000000)
        [2015-02-18 12:36:12.376829] I [client.c:2215:client_rpc___notify]
        0-drslk-prod-client-10: disconnected from drslk-prod-client-10.
        Client
        process will keep trying to connect to glusterd until brick's
        port is
        available
        [2015-02-18 12:36:12.376834] W [rpc-clnt.c:1562:rpc_clnt___submit]
        0-drslk-prod-client-10: failed to submit rpc-request (XID: 0x5d475dd
        Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
        (drslk-prod-client-10)
        [2015-02-18 12:36:12.376906] W
        [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
        0-drslk-prod-client-10:
        remote operation failed: Transport endpoint is not connected. Path:
        (null) (00000000-0000-0000-0000-__000000000000)
        [2015-02-18 12:36:12.376931] E
        [socket.c:2267:socket_connect___finish]
        0-drslk-prod-client-10: connection to 10.10.11.23:24007
        <http://10.10.11.23:24007>
        <http://10.10.11.23:24007/> failed (Connection refused)

        [2015-02-18 12:36:12.379296] W
        [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
        0-drslk-prod-client-10:
        remote operation failed: Transport endpoint is not connected. Path:
        (null) (00000000-0000-0000-0000-__000000000000)
        [2015-02-18 12:36:12.379700] W
        [client-rpc-fops.c:2766:__client3_3_lookup_cbk]
        0-drslk-prod-client-10:
        remote operation failed: Transport endpoint is not connected. Path:
        (null) (00000000-0000-0000-0000-__000000000000)
        [2015-02-18 13:10:52.759736] E
        [client-handshake.c:1496:__client_query_portmap_cbk]
        0-drslk-prod-client-10: failed to get the port number for remote
        subvolume. Please run 'gluster volume status' on server to see
        if brick
        process is running.
        [2015-02-18 13:10:52.759796] I [client.c:2215:client_rpc___notify]
        0-drslk-prod-client-10: disconnected from drslk-prod-client-10.
        Client
        process will keep trying to connect to glusterd until brick's
        port is
        available
        [2015-02-18 13:11:02.897307] I [rpc-clnt.c:1761:rpc_clnt___reconfig]
        0-drslk-prod-client-10: changing port to 49349 (from 0)
        [2015-02-18 13:11:02.898097] I
        [client-handshake.c:1413:__select_server_supported___programs]
        0-drslk-prod-client-10: Using Program GlusterFS 3.3, Num (1298437),
        Version (330)
        [2015-02-18 13:11:02.898446] I
        [client-handshake.c:1200:__client_setvolume_cbk]
        0-drslk-prod-client-10:
        Connected to drslk-prod-client-10, attached to remote volume
        '/GLUSTERFS/drslk-prod'.
        [2015-02-18 13:11:02.898460] I
        [client-handshake.c:1210:__client_setvolume_cbk]
        0-drslk-prod-client-10:
        Server and Client lk-version numbers are not same, reopening the fds


    Can you provide the gluster volume configuration details?

    It does look like frame-timeout for the volume has been set to 60.
    Is there any specific reason? Normally altering the frame-timeout is
    not recommended.

    -Vijay



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users





[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux