Re: geo replication, invalid slave name and gluster 3.5.1

Stefan Moravcik <smoravcik@xxxxxxxxxxxxxx> · Tue, 22 Jul 2014 08:53:40 +0200

    Hello,

    I have deleted all the /var/lib/glusterd files and directories on
    slave and did all the steps once again and it worked... I was not
    able to replicate the problem with faulty slave anymore... 

    I am not sure what was the problem but I think it was one of the old
    steps from previous geo documentations. Maybe a value in the config
    that broke the status and the replication... So after a clean up and
    doing everything from scratch, the new documentation worked well...
    Thanks again

    Best regards,

    Stefan

    On 17/07/14 13:18, Aravinda wrote:

      On 07/16/2014 02:20 PM, Stefan
        Moravcik wrote:

        Hello Vishwanath,

        thanks for pointing me to right direction... This was helpful...
        I thought the password less ssh connection was done from
        glusterfs using the secret.pem in the initial run.. But wasn't..
        I had to create the id_rsa in the /root/.ssh/ directory to be
        able to ssh to slave without any -i option...

        Great, thanks for that... However i have additional question...
        Again little bit different to the previous ones... This seems
        like a bug to me.. But you for sure will know better.

        After I created the geo-replication volume and i started it..
        everything looked Ok and successful. Then i looked in the status
        command and got this

        MASTER NODE                  MASTER VOL    MASTER BRICK         
        SLAVE                           STATUS    CHECKPOINT STATUS   
        CRAWL STATUS        

---------------------------------------------------------------------------------------------------------------------------------------------

        1.1.1.1    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave   
        faulty    N/A                  N/A                 

        1.1.1.2    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave   
        faulty    N/A                  N/A                 

        1.1.1.3    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave   
        faulty    N/A                  N/A

        when i checked the config file there is 

        remote_gsyncd: /nonexistent/gsyncd 

        i even tried to create symlinks for this but faulty status has
        never gone away... Found a bug report on bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1105283

      Update conf file manually as following and stop and start the
      geo-replication.(Conf file location:
      /var/lib/glusterd/geo-replication/<MASTER VOL>_<SLAVE
      IP_SLAVE VOL>/gsyncd.conf)

      remote_gsyncd = /usr/libexec/glusterfs/gsyncd

      Let us know if this resolves the issue.

      --

      regards

      Aravinda

      http://aravindavk.in

        [2014-07-16 07:14:34.718718] E
        [glusterd-geo-rep.c:2685:glusterd_gsync_read_frm_status] 0-:
        Unable to read gsyncd status file

        [2014-07-16 07:14:34.718756] E
        [glusterd-geo-rep.c:2999:glusterd_read_status_file] 0-: Unable
        to read the statusfile for /shared/myvol1 brick for 
        repository(master), 1.2.3.4::myvol1_slave(slave) session

        However since the symlink is in place the error message above
        won't show in the log.. Actually there are no more error logs
        just faulty status...

        Even more interesting. When i changed the configuration from
        rsync to tar+ssh it synced the files there, but will not
        replicate any changes or new files created.... 

        MASTER NODE                  MASTER VOL    MASTER BRICK         
        SLAVE                           STATUS    CHECKPOINT STATUS   
        CRAWL STATUS    FILES SYNCD    FILES PENDING    BYTES PENDING   
        DELETES PENDING    FILES SKIPPED   

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

        1.1.1.1    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave   
        faulty    N/A                  N/A             10001         
        0                0                0                 
        0               

        1.1.1.2    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave   
        faulty    N/A                  N/A             0             
        0                0                0                 
        0               

        1.1.1.3    myvol1    /shared/myvol1    1.2.3.4::myvol1_slave   
        faulty    N/A                  N/A             0             
        0                0                0                  0

        as you can see, 10001 files replicated... but if i create a new
        one or edit the existing ones, the faulty status will not
        replicate anymore.. This is true even if i change back from
        tar+ssh to rsync or restart glusterd or anything... 

        Thank you for all your help, much appreciated

        Regards,

        Stefan 

        On 15/07/14 17:15, M S Vishwanath
          Bhat wrote:

          On 15/07/14 18:13, Stefan
            Moravcik wrote:

          Hello Vishwanath 

            thank you for your quick reply but i have a follow up
            question if it is ok... Maybe a different issue and i should
            open a new thread, but i will try to continue to use this
            one... 

            So I followed the new documentation... let me show you what
            i have done and what is the final error message... 

            I have 3 servers node1, node2 and node3 with IPs 1.1.1.1,
            1.1.1.2 and 1.1.1.3 

            I installed glusterfs-server and glusterfs-geo-replication
            on all 3 of them... I created replica volume called myvol1
            and run the command 

            gluster system:: execute gsec_create 

            this created 4 files: 

            secret.pem 

            secret.pem.pub 

            tar_ssh.pem 

            tar_ssh.pem.pub 

            The pub file is different on all 3 nodes so I copied all 3
            secret.pem.pub to slave authorized_keys. I tried to ssh
            directly to slave server from all 3 nodes and got through
            with no problem. 

            So I connected to slave server installed glusterfs-server
            and glusterfs-geo-replication there too. 

            Started the glusterd and created a volume called
            myvol1_slave 

            Then I peer probed one of the masters with slave. This
            showed the volume in my master and peer appeared in peer
            status. 

            From here i run the command in your documentation 

            volume geo-replication myvol1 1.2.3.4::myvol1_slave create
            push-pem 

            Passwordless ssh login has not been setup with 1.2.3.4. 

            geo-replication command failed 

          Couple of things here.

          I believe it was not clear enough in the docs and I apologise
          for that. But this is the prerequisite for dist-geo-rep.

          * There should be a password-less ssh setup between at
            least one node in master volume to one node in slave volume.
            The geo-rep create command should be executed from this node
            which has password-less ssh setup to slave.

          So in your case, you can setup a password less ssh between
          1.1.1.1 (one master volume node) to 1.2.3.4 (one slave volume
          node). You can use "ssh-keygen" and "ssh-copy-id" to do the
          same.

          After the above step is done, execute the "gluster system::
          execute gsec_create". You don't need to copy it to the slave
          autorized_keys. geo-rep create push-pem takes care of it for
          you.

          Now, you should execute "gluster volume geo-rep myvol1
          1.2.3.4::myvol1_slave cerate push-pem" from 1.1.1.1 (because
          this node has passwordless ssh to 1.2.3.4 mentioned in the
          command)

          That should create a geo-rep session for you. That can be
          started later on.

          And you don't need to peer probe slave from master or vice
          versa. Logically both master and slave volumes are in
          different clusters (in two different geographic locations).

          HTH,

          Vishwanath

            In the secure log file i could see the connection though. 

            2014-07-15T13:26:56.083445+01:00 1testlab sshd[23905]: Set
            /proc/self/oom_score_adj to 0 

            2014-07-15T13:26:56.089423+01:00 1testlab sshd[23905]:
            Connection from 1.1.1.1 port 58351 

            2014-07-15T13:26:56.248687+01:00 1testlab sshd[23906]:
            Connection closed by 1.1.1.1 

            and in the logs of one of the masters 

            [2014-07-15 12:26:56.247667] E
            [glusterd-geo-rep.c:1889:glusterd_verify_slave] 0-: Not a
            valid slave 

            [2014-07-15 12:26:56.247752] E
            [glusterd-geo-rep.c:2106:glusterd_op_stage_gsync_create] 0-:
            1.2.3.4::myvol1_slave is not a valid slave volume. Error:
            Passwordless ssh login has not been setup with 1.2.3.4. 

            [2014-07-15 12:26:56.247772] E
            [glusterd-syncop.c:912:gd_stage_op_phase] 0-management:
            Staging of operation 'Volume Geo-replication Create' failed
            on localhost : Passwordless ssh login has not been setup
            with 1.2.3.4. 

            there is no log in the other masters in the cluster nor on
            slave.. 

            I even tried with force option, but same result... I
            disabled firewall and selinux just to make sure those parts
            of the system do not interfere. Searched a google for same
            problem and found one... http://irclog.perlgeek.de/gluster/2014-01-16
            but again no answer or solution. 

            Thank you for your time and help. 

            Best regards, 

            Stefan 

            On 15/07/14 12:26, M S Vishwanath Bhat wrote: 

            On 15/07/14 15:08, Stefan Moravcik
              wrote: 

              Hello Guys, 

                I have been trying to set a geo replication in our
                glusterfs test environment and got a problem with a
                message "invalid slave name" 

                So first things first... 

                I have 3 nodes configured in a cluster. Those nodes are
                configured as replica. On this cluster I have a volume
                created with let say name myvol1. So far everything
                works and looks good... 

                Next step was to create a geo replication off site.. So
                i followed this documentation: 

                http://www.gluster.org/community/documentation/index.php/HowTo:geo-replication

              These are old docs. I have edited this to mention that it
              is old geo-rep docs. 

              Please refer to https://github.com/gluster/glusterfs/blob/master/doc/admin-guide/en-US/markdown/admin_distributed_geo_rep.md
              or https://medium.com/@msvbhat/distributed-geo-replication-in-glusterfs-ec95f4393c50
              for latest distributed-geo-rep documentation. 

                I had peered the slave server, created secret.pem was
                able to ssh without the password and tried to create the
                geo replication volume with the code from the
                documentation and got the following error: 

                on master: 

                gluster volume geo-replication myvol1
                1.2.3.4:/shared/myvol1_slave start 

                on master: 

                [2014-07-15 09:15:37.188701] E
                [glusterd-geo-rep.c:4083:glusterd_get_slave_info] 0-:
                Invalid slave name 

                [2014-07-15 09:15:37.188827] W [dict.c:778:str_to_data]
                (-->/usr/lib64/glusterfs/3.5.1/xlator/mgmt/glusterd.so(glusterd_op_stage_gsync_create+0x1e2)

                [0x7f979e20f1f2]
                (-->/usr/lib64/glusterfs/3.5.1/xlator/mgmt/glusterd.so(glusterd_get_slave_details_confpath+0x116)

                [0x7f979e20a306]
                (-->/usr/lib64/libglusterfs.so.0(dict_set_str+0x1c)
                [0x7f97a322045c]))) 0-dict: value is NULL 

                [2014-07-15 09:15:37.188837] E
                [glusterd-geo-rep.c:3995:glusterd_get_slave_details_confpath]
                0-: Unable to store slave volume name. 

                [2014-07-15 09:15:37.188849] E
                [glusterd-geo-rep.c:2056:glusterd_op_stage_gsync_create]
                0-: Unable to fetch slave or confpath details. 

                [2014-07-15 09:15:37.188861] E
                [glusterd-syncop.c:912:gd_stage_op_phase] 0-management:
                Staging of operation 'Volume Geo-replication Create'
                failed on localhost 

                there are no logs on slave what so ever 

                I also tried different documentation with "create
                push-pem" got the very same problem as above... 

                I tried to start the volume as node:/path/to/dir and
                also created a volume on slave and started as
                node:/slave_volume_name always a same result... 

                Tried to search for a solution and found this http://fpaste.org/114290/04117421/

                It was different user with a very same problem... The
                issue was shown on IRC channel, but never answered.. 

                This is a fresh install of 3.5.1, so no upgrade should
                be needed i guess... Any help solving this problem would
                be appreciated.. 

              From what you have described, it looks like your slave is
              not a gluster volume. In latest geo-rep, slave has to be a
              gluster volume. Now glusterfs does not support a simple
              directory as a slave. 

              Please follow new documentation and try once more. 

              HTH 

              Best Regards, 

              Vishwanath 

                Thank you and best regards, 

                Stefan 

        ***************************************************************************************************************************************************************************
        This email and any files transmitted with it are
            confidential and intended solely for the use of the
            individual or entity to whom they are addressed.
        If you have received this email in error please
            reply to the sender indicating that fact and delete the copy
            you received.
        In addition, if you are not the intended recipient,
            you should not print, copy, retransmit, disseminate, or
            otherwise use the information
        contained in this communication. Thank you.

        Newsweaver is a Trade Mark of E-Search Ltd.
            Registered in Ireland No. 254994.
        Registered Office: 2200 Airport Business Park,
            Kinsale Road, Cork, Ireland. International Telephone Number:
            +353 21 2427277.
        ***************************************************************************************************************************************************************************

        _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

***************************************************************************************************************************************************************************
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please reply to the sender indicating that fact and delete the copy you received.
In addition, if you are not the intended recipient, you should not print, copy, retransmit, disseminate, or otherwise use the information
contained in this communication. Thank you.

Newsweaver is a Trade Mark of E-Search Ltd. Registered in Ireland No. 254994.
Registered Office: 2200 Airport Business Park, Kinsale Road, Cork, Ireland. International Telephone Number: +353 21 2427277.
***************************************************************************************************************************************************************************
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users