Re: [ovirt-users] timeouts

"paf1@xxxxxxxx" <paf1@xxxxxxxx> · Fri, 27 Nov 2015 13:46:25 +0100



    Hi, 

    all glusterd daemons was runnig correctly at this time, no
    firewalls/iptables restrictions

    But  "not connected" bricks are changing during the time without any
    touch .

    It looks that glusterd  has non-stable cross  communication ,
    especially with different LAN range  as nodes in Ovirt environmet

    ( Volumes bricks in 16.0.0.0 net and ovirt nodes in 172.0.0.0 net )

    So I desided reinstall whole cluster, but I'm afraid that these
    problems will occure again - will you know

    
    regs.for your answers

    Pavel

    
    On 27.11.2015 10:16, knarra wrote:

    
      On 11/27/2015 11:04 AM, knarra wrote:

      
        Hi Paf1,

          
              Looks like when you reboot the nodes, glusterd does not
          start up in one node and due to this the node gets
          disconnected from other node(that is what i see from logs).
          After reboot, once your systems are up and running , can you
          check if glusterd is running on all the nodes? Can you please
          let me know which build of gluster are you using ?

          
              For more info please read, http://www.gluster.org/pipermail/gluster-users.old/2015-June/022377.html
          - (please ignore this line)

        
          Thanks

          kasturi

          
          On 11/27/2015 10:52 AM, Sahina Bose wrote:

        
          [+ gluster-users]

          
          On 11/26/2015 08:37 PM, paf1@xxxxxxxx wrote:

          
            Hello, 

            can anybody  help me with this timeouts ??

            Volumes are not active yes ( bricks down )

            
            desc. of gluster bellow ...

            
            /var/log/glusterfs/etc-glusterfs-glusterd.vol.log

            [2015-11-26 14:44:47.174221] I [MSGID: 106004]
            [glusterd-handler.c:5065:__glusterd_peer_rpc_notify]
            0-management: Peer <1hp1-SAN>
            (<87fc7db8-aba8-41f2-a1cd-b77e83b17436>), in state
            <Peer in Cluster>, has disconnected from glusterd.

            [2015-11-26 14:44:47.174354] W
            [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
            (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)


            [0x7fb7039d44dc]
            -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)


            [0x7fb7039de542]
            -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)


            [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P1 not
            held

            [2015-11-26 14:44:47.174444] W
            [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
            (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)


            [0x7fb7039d44dc]
            -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)


            [0x7fb7039de542]
            -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)


            [0x7fb703a79b4a] ) 0-management: Lock for vol 1HP12-P3 not
            held

            [2015-11-26 14:44:47.174521] W
            [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
            (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)


            [0x7fb7039d44dc]
            -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)


            [0x7fb7039de542]
            -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)


            [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P1 not
            held

            [2015-11-26 14:44:47.174662] W
            [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
            (-->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)


            [0x7fb7039d44dc]
            -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)


            [0x7fb7039de542]
            -->/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)


            [0x7fb703a79b4a] ) 0-management: Lock for vol 2HP12-P3 not
            held

            [2015-11-26 14:44:47.174532] W [MSGID: 106118]
            [glusterd-handler.c:5087:__glusterd_peer_rpc_notify]
            0-management: Lock not released for 2HP12-P1

            [2015-11-26 14:44:47.174675] W [MSGID: 106118]
            [glusterd-handler.c:5087:__glusterd_peer_rpc_notify]
            0-management: Lock not released for 2HP12-P3

            [2015-11-26 14:44:49.423334] I [MSGID: 106488]
            [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
            0-glusterd: Received get vol req

            The message "I [MSGID: 106488]
            [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
            0-glusterd: Received get vol req" repeated 4 times between
            [2015-11-26 14:44:49.423334] and [2015-11-26
            14:44:49.429781]

            [2015-11-26 14:44:51.148711] I [MSGID: 106163]
            [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
            0-management: using the op-version 30702

            [2015-11-26 14:44:52.177266] W
            [socket.c:869:__socket_keepalive] 0-socket: failed to set
            TCP_USER_TIMEOUT -1000 on socket 12, Invalid argument

            [2015-11-26 14:44:52.177291] E
            [socket.c:2965:socket_connect] 0-management: Failed to set
            keep-alive: Invalid argument

            [2015-11-26 14:44:53.180426] W
            [socket.c:869:__socket_keepalive] 0-socket: failed to set
            TCP_USER_TIMEOUT -1000 on socket 17, Invalid argument

            [2015-11-26 14:44:53.180447] E
            [socket.c:2965:socket_connect] 0-management: Failed to set
            keep-alive: Invalid argument

            [2015-11-26 14:44:52.395468] I [MSGID: 106163]
            [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack]
            0-management: using the op-version 30702

            [2015-11-26 14:44:54.851958] I [MSGID: 106488]
            [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume]
            0-glusterd: Received get vol req

            [2015-11-26 14:44:57.183969] W
            [socket.c:869:__socket_keepalive] 0-socket: failed to set
            TCP_USER_TIMEOUT -1000 on socket 19, Invalid argument

            [2015-11-26 14:44:57.183990] E
            [socket.c:2965:socket_connect] 0-management: Failed to set
            keep-alive: Invalid argument

            
            After volumes creation all works fine ( volumes up ) , but
            then, after several reboots ( yum updates) volumes failed
            due timeouts .

            
            Gluster description:

            
            4 nodes with 4 volumes replica 2 

            oVirt 3.6 - the last

            gluster 3.7.6 - the last 

            vdsm 4.17.999 - from git repo

            oVirt - mgmt.nodes 172.16.0.0

            oVirt - bricks 16.0.0.0 ( "SAN" - defined as "gluster" net)

            Network works fine, no lost packets

            
            # gluster volume status 

            Staging failed on 2hp1-SAN. Please check log file for
            details.

            Staging failed on 1hp2-SAN. Please check log file for
            details.

            Staging failed on 2hp2-SAN. Please check log file for
            details.

            
            # gluster volume info

            
            Volume Name: 1HP12-P1

            Type: Replicate

            Volume ID: 6991e82c-9745-4203-9b0a-df202060f455

            Status: Started

            Number of Bricks: 1 x 2 = 2

            Transport-type: tcp

            Bricks:

            Brick1: 1hp1-SAN:/STORAGE/p1/G

            Brick2: 1hp2-SAN:/STORAGE/p1/G

            Options Reconfigured:

            performance.readdir-ahead: on

            
            Volume Name: 1HP12-P3

            Type: Replicate

            Volume ID: 8bbdf0cb-f9b9-4733-8388-90487aa70b30

            Status: Started

            Number of Bricks: 1 x 2 = 2

            Transport-type: tcp

            Bricks:

            Brick1: 1hp1-SAN:/STORAGE/p3/G

            Brick2: 1hp2-SAN:/STORAGE/p3/G

            Options Reconfigured:

            performance.readdir-ahead: on

            
            Volume Name: 2HP12-P1

            Type: Replicate

            Volume ID: e2cd5559-f789-4636-b06a-683e43e0d6bb

            Status: Started

            Number of Bricks: 1 x 2 = 2

            Transport-type: tcp

            Bricks:

            Brick1: 2hp1-SAN:/STORAGE/p1/G

            Brick2: 2hp2-SAN:/STORAGE/p1/G

            Options Reconfigured:

            performance.readdir-ahead: on

            
            Volume Name: 2HP12-P3

            Type: Replicate

            Volume ID: b5300c68-10b3-4ebe-9f29-805d3a641702

            Status: Started

            Number of Bricks: 1 x 2 = 2

            Transport-type: tcp

            Bricks:

            Brick1: 2hp1-SAN:/STORAGE/p3/G

            Brick2: 2hp2-SAN:/STORAGE/p3/G

            Options Reconfigured:

            performance.readdir-ahead: on

            
            regs. for any hints

            Paf1

            
            _______________________________________________
Users mailing list
Users@xxxxxxxxx
http://lists.ovirt.org/mailman/listinfo/users

          
          _______________________________________________
Users mailing list
Users@xxxxxxxxx
http://lists.ovirt.org/mailman/listinfo/users

        
        _______________________________________________
Users mailing list
Users@xxxxxxxxx
http://lists.ovirt.org/mailman/listinfo/users

      
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users