Re: Change transport-type on volume from tcp to rdma, tcp

Mohammed Rafi K C <rkavunga@xxxxxxxxxx> · Wed, 22 Jul 2015 13:30:44 +0530



    On 07/22/2015 12:55 PM, Geoffrey
      Letessier wrote:

    
      Concerning the hang, I just saw this only once with TCP protocol
      but, actually, RDMA seems to be in cause.
    

    If you are mounting a tcp,rdma volume using tcp protocol, all the
    communication will go through the tcp connection and rdma won't come
    in between client and server.

    
    … And, after a moment (a few minutes after having
      restarted my back-transfert of around 40TB), my volume fall down
      (and all my rsync too):
      
        [root@atlas
          ~]# df -h /mnt
        df:
          « /mnt »: Noeud final de transport n'est pas connecté
        df:
          aucun système de fichiers traité
        aka "transport endpoint is not connected »
      
    
    Can you sent me the following details , if possible, ?

    1) mount command used, 2) volume status 3) Client, brick logs 

    
    Regards

    Rafi KC

    
        Geoffrey
        

          ------------------------------------------------------

          Geoffrey Letessier

          Responsable informatique & ingénieur système

          UPR 9080 - CNRS - Laboratoire de Biochimie Théorique

          Institut de Biologie Physico-Chimique

          13, rue Pierre et Marie Curie - 75005 Paris

          Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
        
        
          Le 22 juil. 2015 à 09:17, Geoffrey Letessier <geoffrey.letessier@xxxxxxx>
            a écrit :
          

            Hi Rafi,
              

              It’s what I do. But I note particularly this kind of
                trouble when I mount my volumes manually.
              

              In addition, when I changed my transport-type from
                tcp or rdma to tcp,rdma, I have had to restart my volume
                in order they can took effect. 
              

              I wonder if these trouble are not due to RDMA
                protocol… because it looks like more stable with TCP
                one.
              

              Another idea?
              Thanks for replying and by advance,
              Geoffrey
              
                ------------------------------------------------------

                Geoffrey Letessier

                Responsable informatique & ingénieur système

                UPR 9080 - CNRS - Laboratoire de Biochimie Théorique

                Institut de Biologie Physico-Chimique

                13, rue Pierre et Marie Curie - 75005 Paris

                Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
              
              
                Le 22 juil. 2015 à 07:33, Mohammed Rafi K C <rkavunga@xxxxxxxxxx>
                  a écrit :
                

                    On 07/22/2015 04:51 AM,
                      Geoffrey Letessier wrote:

                    
                      Hi Niels,
                      

                      Thanks for replying. 
                      

                      In fact, after having checked the log, I've
                        discovered GlusterFS tried to connect a brick
                        with a TCP (or RDMA) port allocated to another
                        volume… (bug?)
                      For example, here is a extract of my
                        workdir.log file :
                      
                        [2015-07-21
                          21:34:01.820188] E
                          [socket.c:2332:socket_connect_finish]
                          0-vol_workdir_amd-client-0: connection to
                          10.0.4.1:49161 failed (Connexion refusée)
                        [2015-07-21
                          21:34:01.822563] E
                          [socket.c:2332:socket_connect_finish]
                          0-vol_workdir_amd-client-2: connection to
                          10.0.4.1:49162 failed (Connexion refusée)
                        

                        But the 2 ports (49161 and 49162) concerned
                          only my vol_home volume, not the
                          vol_workdir_amd one.
                        

                        Now, after having restart all glusterd
                          synchronously (pdsh -w cl-storage[1-4] service
                          glusterd restart), all seems to be back into a
                          normal situation (size, write permission,
                          etc.)
                        

                        But, a few minutes later, i note a strange
                          thing I notice since i’ve upgraded my cluster
                          storage from 3.5.3 to 3.7.2-3: when I try to
                          mount some volume (particularly my vol_shared
                          volume (replicated volume)) my system can
                          hang… And, because I use it in my bashrc file
                          for my environment modules, i need to restart
                          my node. Idem if I try to do a DF on my
                          mounted volume (if it doesn’t hang during the
                          mount).
                        

                        With TCP transport-type, the situation
                          seems to be more stable..
                        

                        In addition: If I restart a storage node, I
                          can’t use Gluster CLI (it also hang).
                        

                        Do you have an idea?
                      
                    
                    Are you using bash script to start/mount the volume
                    ? If so, add a sleep after volume start and mount,
                    to allow all the process to start properly. Because
                    RDMA protocol will take some time to init the
                    resources.

                    
                    Regards

                    Rafi KC

                    
                        One more time, thanks a lot for your help,
                        Geoffrey
                      
                      
                        ------------------------------------------------------

                        Geoffrey Letessier

                        Responsable informatique & ingénieur système

                        UPR 9080 - CNRS - Laboratoire de Biochimie
                        Théorique

                        Institut de Biologie Physico-Chimique

                        13, rue Pierre et Marie Curie - 75005 Paris

                        Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
                      
                      
                        Le 21 juil. 2015 à 23:49, Niels de Vos <ndevos@xxxxxxxxxx>

                          a écrit :
                        

                        On Tue, Jul 21, 2015 at
                          11:20:20PM +0200, Geoffrey Letessier wrote:

                          Hello Soumya, Hello
                            everybody,

                            
                            network.ping-timeout was set to 42 seconds.
                            I set it to 0 but no

                            difference. The problem was, after having
                            re-set le transport-type to

                            rdma,tcp some brick down after a few
                            minutes.. Despite of restarting

                            volumes, after a few minutes, some
                            [other/different] bricks down

                            again.

                          
                          I'm not sure how if the ping-timeout is
                          differently handled when RDMA is

                          used. Adding two of the guys that know RDMA
                          well on CC.

                          
                          Now, after re-creation
                            of my volume, bricks keep alive but, oddly,
                            i’m

                            not able to write on my volume. In addition,
                            I defined a distributed

                            volume with 2 servers, 4 bricks of 250GB
                            each and my final volume

                            seems to be only sized to 500GB… It’s
                            amazing.. 

                          
                          As seen further below, the 500GB volume is
                          caused by two unreachable

                          bricks. When the bricks are not reachable, the
                          size of the bricks can

                          not be detected by the client and therefore 2x
                          250 GB is missing.

                          
                          It is unclear to me why writing to a pure
                          distributed volume fails. When

                          a brick is not reachable, and the file should
                          be created there, it

                          would normally get created on an other brick.
                          When the brick that should

                          have the file gets online, and a new lookup
                          for the file is done, a so

                          called "link file" is created, which points to
                          the file on the other

                          brick. I guess the failure has to do with the
                          connection issues, and I

                          would suggest to get that solved first.

                          
                          HTH,

                          Niels

                          
                          Here you can find some
                            information:

                            # gluster volume status vol_workdir_amd

                            Status of volume: vol_workdir_amd

                            Gluster process
                                                        TCP Port  RDMA
                            Port  Online  Pid

------------------------------------------------------------------------------

                            Brick ib-storage1:/export/brick_workdir/bri

                            ck1/data
                                                               49185
                                49186      Y       23098

                            Brick ib-storage3:/export/brick_workdir/bri

                            ck1/data
                                                               49158
                                49159      Y       3886 

                            Brick ib-storage1:/export/brick_workdir/bri

                            ck2/data
                                                               49187
                                49188      Y       23117

                            Brick ib-storage3:/export/brick_workdir/bri

                            ck2/data
                                                               49160
                                49161      Y       3905 

                            
                            # gluster volume info vol_workdir_amd

                            
                            Volume Name: vol_workdir_amd

                            Type: Distribute

                            Volume ID:
                            087d26ea-c6df-4cbe-94af-ecd87b59aedb

                            Status: Started

                            Number of Bricks: 4

                            Transport-type: tcp,rdma

                            Bricks:

                            Brick1:
                            ib-storage1:/export/brick_workdir/brick1/data

                            Brick2:
                            ib-storage3:/export/brick_workdir/brick1/data

                            Brick3:
                            ib-storage1:/export/brick_workdir/brick2/data

                            Brick4:
                            ib-storage3:/export/brick_workdir/brick2/data

                            Options Reconfigured:

                            performance.readdir-ahead: on

                            
                            # pdsh -w storage[1,3] df -h
                            /export/brick_workdir/brick{1,2}

                            storage3: Filesystem            Size  Used
                            Avail Use% Mounted on

                            storage3:
                            /dev/mapper/st--block1-blk1--workdir

                            storage3:                       250G   34M
                             250G   1% /export/brick_workdir/brick1

                            storage3:
                            /dev/mapper/st--block2-blk2--workdir

                            storage3:                       250G   34M
                             250G   1% /export/brick_workdir/brick2

                            storage1: Filesystem            Size  Used
                            Avail Use% Mounted on

                            storage1:
                            /dev/mapper/st--block1-blk1--workdir

                            storage1:                       250G   33M
                             250G   1% /export/brick_workdir/brick1

                            storage1:
                            /dev/mapper/st--block2-blk2--workdir

                            storage1:                       250G   33M
                             250G   1% /export/brick_workdir/brick2

                            
                            # df -h /workdir/

                            Filesystem            Size  Used Avail Use%
                            Mounted on

                            localhost:vol_workdir_amd.rdma

                                                 500G   67M  500G   1%
                            /workdir

                            
                            # touch /workdir/test

                            touch: impossible de faire un touch «
                            /workdir/test »: Aucun fichier ou dossier de
                            ce type

                            
                            # tail -30l /var/log/glusterfs/workdir.log 

                            Host Unreachable, Check your connection with
                            IPoIB

                            [2015-07-21 21:10:33.927673] W
                            [rdma.c:1263:gf_rdma_cm_event_handler]
                            0-vol_workdir_amd-client-2: cma event
                            RDMA_CM_EVENT_REJECTED, error 8
                            (me:10.0.4.1:1020 peer:10.0.4.1:49174)

                            Host Unreachable, Check your connection with
                            IPoIB

                            [2015-07-21 21:10:37.877231] I
                            [rpc-clnt.c:1819:rpc_clnt_reconfig]
                            0-vol_workdir_amd-client-0: changing port to
                            49173 (from 0)

                            [2015-07-21 21:10:37.880556] I
                            [rpc-clnt.c:1819:rpc_clnt_reconfig]
                            0-vol_workdir_amd-client-2: changing port to
                            49174 (from 0)

                            [2015-07-21 21:10:37.914661] W
                            [rdma.c:1263:gf_rdma_cm_event_handler]
                            0-vol_workdir_amd-client-0: cma event
                            RDMA_CM_EVENT_REJECTED, error 8
                            (me:10.0.4.1:1021 peer:10.0.4.1:49173)

                            Host Unreachable, Check your connection with
                            IPoIB

                            [2015-07-21 21:10:37.923535] W
                            [rdma.c:1263:gf_rdma_cm_event_handler]
                            0-vol_workdir_amd-client-2: cma event
                            RDMA_CM_EVENT_REJECTED, error 8
                            (me:10.0.4.1:1020 peer:10.0.4.1:49174)

                            Host Unreachable, Check your connection with
                            IPoIB

                            [2015-07-21 21:10:41.883925] I
                            [rpc-clnt.c:1819:rpc_clnt_reconfig]
                            0-vol_workdir_amd-client-0: changing port to
                            49173 (from 0)

                            [2015-07-21 21:10:41.887085] I
                            [rpc-clnt.c:1819:rpc_clnt_reconfig]
                            0-vol_workdir_amd-client-2: changing port to
                            49174 (from 0)

                            [2015-07-21 21:10:41.919394] W
                            [rdma.c:1263:gf_rdma_cm_event_handler]
                            0-vol_workdir_amd-client-0: cma event
                            RDMA_CM_EVENT_REJECTED, error 8
                            (me:10.0.4.1:1021 peer:10.0.4.1:49173)

                            Host Unreachable, Check your connection with
                            IPoIB

                            [2015-07-21 21:10:41.932622] W
                            [rdma.c:1263:gf_rdma_cm_event_handler]
                            0-vol_workdir_amd-client-2: cma event
                            RDMA_CM_EVENT_REJECTED, error 8
                            (me:10.0.4.1:1020 peer:10.0.4.1:49174)

                            Host Unreachable, Check your connection with
                            IPoIB

                            [2015-07-21 21:10:44.682636] W
                            [dht-layout.c:189:dht_layout_search]
                            0-vol_workdir_amd-dht: no subvolume for hash
                            (value) = 1072520554

                            [2015-07-21 21:10:44.682947] W
                            [dht-layout.c:189:dht_layout_search]
                            0-vol_workdir_amd-dht: no subvolume for hash
                            (value) = 1072520554

                            [2015-07-21 21:10:44.683240] W
                            [dht-layout.c:189:dht_layout_search]
                            0-vol_workdir_amd-dht: no subvolume for hash
                            (value) = 1072520554

                            [2015-07-21 21:10:44.683472] W
                            [dht-diskusage.c:48:dht_du_info_cbk]
                            0-vol_workdir_amd-dht: failed to get disk
                            info from vol_workdir_amd-client-0

                            [2015-07-21 21:10:44.683506] W
                            [dht-diskusage.c:48:dht_du_info_cbk]
                            0-vol_workdir_amd-dht: failed to get disk
                            info from vol_workdir_amd-client-2

                            [2015-07-21 21:10:44.683532] W
                            [dht-layout.c:189:dht_layout_search]
                            0-vol_workdir_amd-dht: no subvolume for hash
                            (value) = 1072520554

                            [2015-07-21 21:10:44.683551] W
                            [fuse-bridge.c:1970:fuse_create_cbk]
                            0-glusterfs-fuse: 18: /test => -1 (Aucun
                            fichier ou dossier de ce type)

                            [2015-07-21 21:10:44.683619] W
                            [dht-layout.c:189:dht_layout_search]
                            0-vol_workdir_amd-dht: no subvolume for hash
                            (value) = 1072520554

                            [2015-07-21 21:10:44.683846] W
                            [dht-layout.c:189:dht_layout_search]
                            0-vol_workdir_amd-dht: no subvolume for hash
                            (value) = 1072520554

                            [2015-07-21 21:10:45.886807] I
                            [rpc-clnt.c:1819:rpc_clnt_reconfig]
                            0-vol_workdir_amd-client-0: changing port to
                            49173 (from 0)

                            [2015-07-21 21:10:45.893059] I
                            [rpc-clnt.c:1819:rpc_clnt_reconfig]
                            0-vol_workdir_amd-client-2: changing port to
                            49174 (from 0)

                            [2015-07-21 21:10:45.920434] W
                            [rdma.c:1263:gf_rdma_cm_event_handler]
                            0-vol_workdir_amd-client-0: cma event
                            RDMA_CM_EVENT_REJECTED, error 8
                            (me:10.0.4.1:1021 peer:10.0.4.1:49173)

                            Host Unreachable, Check your connection with
                            IPoIB

                            [2015-07-21 21:10:45.925292] W
                            [rdma.c:1263:gf_rdma_cm_event_handler]
                            0-vol_workdir_amd-client-2: cma event
                            RDMA_CM_EVENT_REJECTED, error 8
                            (me:10.0.4.1:1020 peer:10.0.4.1:49174)

                            Host Unreachable, Check your connection with
                            IPoIB

                            
                            I use GlusterFS in production since around 3
                            years without any block

                            problem but now the situation is awesome
                            since more than 3 weeks…

                            Indeed, our production are down since
                            roughly 3.5 weeks (with a lot

                            and different problems with GlusterFS v3.5.3
                            and now with 3.7.2-3) and

                            i need to restart it… 

                            
                            Thanks in advance,

                            Geoffrey

------------------------------------------------------

                            Geoffrey Letessier

                            Responsable informatique & ingénieur
                            système

                            UPR 9080 - CNRS - Laboratoire de Biochimie
                            Théorique

                            Institut de Biologie Physico-Chimique

                            13, rue Pierre et Marie Curie - 75005 Paris

                            Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

                            
                            Le 21 juil. 2015 à 19:36, Soumya Koduri <skoduri@xxxxxxxxxx>

                            a écrit :

                            
                            From the following
                              errors,

                              
                              [2015-07-21 14:36:30.495321] I [MSGID:
                              114020] [client.c:2118:notify]
                              0-vol_shared-client-0: parent translators
                              are ready, attempting connect on transport

                              [2015-07-21 14:36:30.498989] W
                              [socket.c:923:__socket_keepalive]
                              0-socket: failed to set TCP_USER_TIMEOUT 0
                              on socket 12, Protocole non disponible

                              [2015-07-21 14:36:30.499004] E
                              [socket.c:3015:socket_connect]
                              0-vol_shared-client-0: Failed to set
                              keep-alive: Protocole non disponible

                              
                              looks like setting TCP_USER_TIMEOUT value
                              to 0 on the socket failed with error
                              (IIUC) "Protocol not available".

                              Could you check if 'network.ping-timeout'
                              is set to zero for that volume using
                              'gluster volume info'? Anyways from the
                              code looks like 'TCP_USER_TIMEOUT' can
                              take value zero. Not sure why it has
                              failed.

                              
                              Niels, any thoughts?

                              
                              Thanks,

                              Soumya

                              
                              On 07/21/2015 08:15 PM, Geoffrey Letessier
                              wrote:

                              [2015-07-21
                                14:36:30.495321] I [MSGID: 114020]
                                [client.c:2118:notify]

                                0-vol_shared-client-0: parent
                                translators are ready, attempting
                                connect

                                on transport

                                [2015-07-21 14:36:30.498989] W
                                [socket.c:923:__socket_keepalive]

                                0-socket: failed to set TCP_USER_TIMEOUT
                                0 on socket 12, Protocole non

                                disponible

                                [2015-07-21 14:36:30.499004] E
                                [socket.c:3015:socket_connect]

                                0-vol_shared-client-0: Failed to set
                                keep-alive: Protocole non disponible

                              
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users