Re: Change transport-type on volume from tcp to rdma, tcp

Mohammed Rafi K C <rkavunga@xxxxxxxxxx> · Wed, 22 Jul 2015 11:03:01 +0530



    On 07/22/2015 04:51 AM, Geoffrey
      Letessier wrote:

    
      Hi Niels,
      

      Thanks for replying. 
      

      In fact, after having checked the log, I've discovered
        GlusterFS tried to connect a brick with a TCP (or RDMA) port
        allocated to another volume… (bug?)
      For example, here is a extract of my workdir.log file :
      
        [2015-07-21 21:34:01.820188]
          E [socket.c:2332:socket_connect_finish]
          0-vol_workdir_amd-client-0: connection to 10.0.4.1:49161
          failed (Connexion refusée)
        [2015-07-21 21:34:01.822563]
          E [socket.c:2332:socket_connect_finish]
          0-vol_workdir_amd-client-2: connection to 10.0.4.1:49162
          failed (Connexion refusée)
        

        But the 2 ports (49161 and 49162) concerned only my
          vol_home volume, not the vol_workdir_amd one.
        

        Now, after having restart all glusterd synchronously (pdsh
          -w cl-storage[1-4] service glusterd restart), all seems to be
          back into a normal situation (size, write permission, etc.)
        

        But, a few minutes later, i note a strange thing I notice
          since i’ve upgraded my cluster storage from 3.5.3 to 3.7.2-3:
          when I try to mount some volume (particularly my vol_shared
          volume (replicated volume)) my system can hang… And, because I
          use it in my bashrc file for my environment modules, i need to
          restart my node. Idem if I try to do a DF on my mounted volume
          (if it doesn’t hang during the mount).
        

        With TCP transport-type, the situation seems to be more
          stable..
        

        In addition: If I restart a storage node, I can’t use
          Gluster CLI (it also hang).
        

        Do you have an idea?
      
    
    Are you using bash script to start/mount the volume ? If so, add a
    sleep after volume start and mount, to allow all the process to
    start properly. Because RDMA protocol will take some time to init
    the resources.

    
    Regards

    Rafi KC

    
        One more time, thanks a lot for your help,
        Geoffrey
      
      
        ------------------------------------------------------

        Geoffrey Letessier

        Responsable informatique & ingénieur système

        UPR 9080 - CNRS - Laboratoire de Biochimie Théorique

        Institut de Biologie Physico-Chimique

        13, rue Pierre et Marie Curie - 75005 Paris

        Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
      
      
        Le 21 juil. 2015 à 23:49, Niels de Vos <ndevos@xxxxxxxxxx>
          a écrit :
        

        On Tue, Jul 21, 2015 at 11:20:20PM
          +0200, Geoffrey Letessier wrote:

          Hello Soumya, Hello everybody,

            
            network.ping-timeout was set to 42 seconds. I set it to 0
            but no

            difference. The problem was, after having re-set le
            transport-type to

            rdma,tcp some brick down after a few minutes.. Despite of
            restarting

            volumes, after a few minutes, some [other/different] bricks
            down

            again.

          
          I'm not sure how if the ping-timeout is differently handled
          when RDMA is

          used. Adding two of the guys that know RDMA well on CC.

          
          Now, after re-creation of my volume,
            bricks keep alive but, oddly, i’m

            not able to write on my volume. In addition, I defined a
            distributed

            volume with 2 servers, 4 bricks of 250GB each and my final
            volume

            seems to be only sized to 500GB… It’s amazing.. 

          
          As seen further below, the 500GB volume is caused by two
          unreachable

          bricks. When the bricks are not reachable, the size of the
          bricks can

          not be detected by the client and therefore 2x 250 GB is
          missing.

          
          It is unclear to me why writing to a pure distributed volume
          fails. When

          a brick is not reachable, and the file should be created
          there, it

          would normally get created on an other brick. When the brick
          that should

          have the file gets online, and a new lookup for the file is
          done, a so

          called "link file" is created, which points to the file on the
          other

          brick. I guess the failure has to do with the connection
          issues, and I

          would suggest to get that solved first.

          
          HTH,

          Niels

          
          Here you can find some information:

            # gluster volume status vol_workdir_amd

            Status of volume: vol_workdir_amd

            Gluster process                             TCP Port  RDMA
            Port  Online  Pid

------------------------------------------------------------------------------

            Brick ib-storage1:/export/brick_workdir/bri

            ck1/data                                    49185     49186
                 Y       23098

            Brick ib-storage3:/export/brick_workdir/bri

            ck1/data                                    49158     49159
                 Y       3886 

            Brick ib-storage1:/export/brick_workdir/bri

            ck2/data                                    49187     49188
                 Y       23117

            Brick ib-storage3:/export/brick_workdir/bri

            ck2/data                                    49160     49161
                 Y       3905 

            
            # gluster volume info vol_workdir_amd

            
            Volume Name: vol_workdir_amd

            Type: Distribute

            Volume ID: 087d26ea-c6df-4cbe-94af-ecd87b59aedb

            Status: Started

            Number of Bricks: 4

            Transport-type: tcp,rdma

            Bricks:

            Brick1: ib-storage1:/export/brick_workdir/brick1/data

            Brick2: ib-storage3:/export/brick_workdir/brick1/data

            Brick3: ib-storage1:/export/brick_workdir/brick2/data

            Brick4: ib-storage3:/export/brick_workdir/brick2/data

            Options Reconfigured:

            performance.readdir-ahead: on

            
            # pdsh -w storage[1,3] df -h
            /export/brick_workdir/brick{1,2}

            storage3: Filesystem            Size  Used Avail Use%
            Mounted on

            storage3: /dev/mapper/st--block1-blk1--workdir

            storage3:                       250G   34M  250G   1%
            /export/brick_workdir/brick1

            storage3: /dev/mapper/st--block2-blk2--workdir

            storage3:                       250G   34M  250G   1%
            /export/brick_workdir/brick2

            storage1: Filesystem            Size  Used Avail Use%
            Mounted on

            storage1: /dev/mapper/st--block1-blk1--workdir

            storage1:                       250G   33M  250G   1%
            /export/brick_workdir/brick1

            storage1: /dev/mapper/st--block2-blk2--workdir

            storage1:                       250G   33M  250G   1%
            /export/brick_workdir/brick2

            
            # df -h /workdir/

            Filesystem            Size  Used Avail Use% Mounted on

            localhost:vol_workdir_amd.rdma

                                 500G   67M  500G   1% /workdir

            
            # touch /workdir/test

            touch: impossible de faire un touch « /workdir/test »: Aucun
            fichier ou dossier de ce type

            
            # tail -30l /var/log/glusterfs/workdir.log 

            Host Unreachable, Check your connection with IPoIB

            [2015-07-21 21:10:33.927673] W
            [rdma.c:1263:gf_rdma_cm_event_handler]
            0-vol_workdir_amd-client-2: cma event
            RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020
            peer:10.0.4.1:49174)

            Host Unreachable, Check your connection with IPoIB

            [2015-07-21 21:10:37.877231] I
            [rpc-clnt.c:1819:rpc_clnt_reconfig]
            0-vol_workdir_amd-client-0: changing port to 49173 (from 0)

            [2015-07-21 21:10:37.880556] I
            [rpc-clnt.c:1819:rpc_clnt_reconfig]
            0-vol_workdir_amd-client-2: changing port to 49174 (from 0)

            [2015-07-21 21:10:37.914661] W
            [rdma.c:1263:gf_rdma_cm_event_handler]
            0-vol_workdir_amd-client-0: cma event
            RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021
            peer:10.0.4.1:49173)

            Host Unreachable, Check your connection with IPoIB

            [2015-07-21 21:10:37.923535] W
            [rdma.c:1263:gf_rdma_cm_event_handler]
            0-vol_workdir_amd-client-2: cma event
            RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020
            peer:10.0.4.1:49174)

            Host Unreachable, Check your connection with IPoIB

            [2015-07-21 21:10:41.883925] I
            [rpc-clnt.c:1819:rpc_clnt_reconfig]
            0-vol_workdir_amd-client-0: changing port to 49173 (from 0)

            [2015-07-21 21:10:41.887085] I
            [rpc-clnt.c:1819:rpc_clnt_reconfig]
            0-vol_workdir_amd-client-2: changing port to 49174 (from 0)

            [2015-07-21 21:10:41.919394] W
            [rdma.c:1263:gf_rdma_cm_event_handler]
            0-vol_workdir_amd-client-0: cma event
            RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021
            peer:10.0.4.1:49173)

            Host Unreachable, Check your connection with IPoIB

            [2015-07-21 21:10:41.932622] W
            [rdma.c:1263:gf_rdma_cm_event_handler]
            0-vol_workdir_amd-client-2: cma event
            RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020
            peer:10.0.4.1:49174)

            Host Unreachable, Check your connection with IPoIB

            [2015-07-21 21:10:44.682636] W
            [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht:
            no subvolume for hash (value) = 1072520554

            [2015-07-21 21:10:44.682947] W
            [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht:
            no subvolume for hash (value) = 1072520554

            [2015-07-21 21:10:44.683240] W
            [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht:
            no subvolume for hash (value) = 1072520554

            [2015-07-21 21:10:44.683472] W
            [dht-diskusage.c:48:dht_du_info_cbk] 0-vol_workdir_amd-dht:
            failed to get disk info from vol_workdir_amd-client-0

            [2015-07-21 21:10:44.683506] W
            [dht-diskusage.c:48:dht_du_info_cbk] 0-vol_workdir_amd-dht:
            failed to get disk info from vol_workdir_amd-client-2

            [2015-07-21 21:10:44.683532] W
            [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht:
            no subvolume for hash (value) = 1072520554

            [2015-07-21 21:10:44.683551] W
            [fuse-bridge.c:1970:fuse_create_cbk] 0-glusterfs-fuse: 18:
            /test => -1 (Aucun fichier ou dossier de ce type)

            [2015-07-21 21:10:44.683619] W
            [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht:
            no subvolume for hash (value) = 1072520554

            [2015-07-21 21:10:44.683846] W
            [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht:
            no subvolume for hash (value) = 1072520554

            [2015-07-21 21:10:45.886807] I
            [rpc-clnt.c:1819:rpc_clnt_reconfig]
            0-vol_workdir_amd-client-0: changing port to 49173 (from 0)

            [2015-07-21 21:10:45.893059] I
            [rpc-clnt.c:1819:rpc_clnt_reconfig]
            0-vol_workdir_amd-client-2: changing port to 49174 (from 0)

            [2015-07-21 21:10:45.920434] W
            [rdma.c:1263:gf_rdma_cm_event_handler]
            0-vol_workdir_amd-client-0: cma event
            RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021
            peer:10.0.4.1:49173)

            Host Unreachable, Check your connection with IPoIB

            [2015-07-21 21:10:45.925292] W
            [rdma.c:1263:gf_rdma_cm_event_handler]
            0-vol_workdir_amd-client-2: cma event
            RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020
            peer:10.0.4.1:49174)

            Host Unreachable, Check your connection with IPoIB

            
            I use GlusterFS in production since around 3 years without
            any block

            problem but now the situation is awesome since more than 3
            weeks…

            Indeed, our production are down since roughly 3.5 weeks
            (with a lot

            and different problems with GlusterFS v3.5.3 and now with
            3.7.2-3) and

            i need to restart it… 

            
            Thanks in advance,

            Geoffrey

            ------------------------------------------------------

            Geoffrey Letessier

            Responsable informatique & ingénieur système

            UPR 9080 - CNRS - Laboratoire de Biochimie Théorique

            Institut de Biologie Physico-Chimique

            13, rue Pierre et Marie Curie - 75005 Paris

            Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

            
            Le 21 juil. 2015 à 19:36, Soumya Koduri <skoduri@xxxxxxxxxx>
            a écrit :

            
            From the following errors,

              
              [2015-07-21 14:36:30.495321] I [MSGID: 114020]
              [client.c:2118:notify] 0-vol_shared-client-0: parent
              translators are ready, attempting connect on transport

              [2015-07-21 14:36:30.498989] W
              [socket.c:923:__socket_keepalive] 0-socket: failed to set
              TCP_USER_TIMEOUT 0 on socket 12, Protocole non disponible

              [2015-07-21 14:36:30.499004] E
              [socket.c:3015:socket_connect] 0-vol_shared-client-0:
              Failed to set keep-alive: Protocole non disponible

              
              looks like setting TCP_USER_TIMEOUT value to 0 on the
              socket failed with error (IIUC) "Protocol not available".

              Could you check if 'network.ping-timeout' is set to zero
              for that volume using 'gluster volume info'? Anyways from
              the code looks like 'TCP_USER_TIMEOUT' can take value
              zero. Not sure why it has failed.

              
              Niels, any thoughts?

              
              Thanks,

              Soumya

              
              On 07/21/2015 08:15 PM, Geoffrey Letessier wrote:

              [2015-07-21 14:36:30.495321] I
                [MSGID: 114020] [client.c:2118:notify]

                0-vol_shared-client-0: parent translators are ready,
                attempting connect

                on transport

                [2015-07-21 14:36:30.498989] W
                [socket.c:923:__socket_keepalive]

                0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 12,
                Protocole non

                disponible

                [2015-07-21 14:36:30.499004] E
                [socket.c:3015:socket_connect]

                0-vol_shared-client-0: Failed to set keep-alive:
                Protocole non disponible

              
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users