Re: [SOLVED] Very Basic question

Luca Mazzaferro <luca.mazzaferro@xxxxxxxxxx> · Fri, 14 Nov 2014 11:11:59 +0100



    Hi,

      problem solved it was a very stupid firewall problem.

      It was not configured correctly in the monitor part.

      The thing that I don't understand is the reason why the OSD didn't
      

      want to start if it was the monitor part with and issue?!

      Thank you.

      Regards.

      
          Luca

      
      On 11/13/2014 06:40 PM, Luca Mazzaferro wrote:

    
      Moreover if I restart the service on
        the ceph-node1, which is the initial monitor and has an osd and
        mds:

        
        [root@ceph-node1 ~]# service ceph restart

        === mon.ceph-node1 === 

        === mon.ceph-node1 === 

        Stopping Ceph mon.ceph-node1 on ceph-node1...kill 1215...done

        === mon.ceph-node1 === 

        Starting Ceph mon.ceph-node1 on ceph-node1...

        Starting ceph-create-keys on ceph-node1...

        === osd.2 === 

        === osd.2 === 

        Stopping Ceph osd.2 on ceph-node1...done

        === osd.2 === 

        2014-11-13 18:30:58.300930 7fef46bfd700  0 -- :/1002590 >>
        192.168.122.23:6789/0 pipe(0x7fef40000c00 sd=4 :0 s=1 pgs=0 cs=0
        l=1 c=0x7fef40000e90).fault

        2014-11-13 18:31:10.302308 7fef4c1ce700  0 --
        192.168.122.21:0/1002590 >> 192.168.122.23:6789/0
        pipe(0x7fef40002000 sd=4 :0 s=1 pgs=0 cs=0 l=1
        c=0x7fef40000c00).fault

        2014-11-13 18:31:16.303037 7fef4c1ce700  0 --
        192.168.122.21:0/1002590 >> 192.168.122.23:6789/0
        pipe(0x7fef40005d30 sd=3 :0 s=1 pgs=0 cs=0 l=1
        c=0x7fef400020d0).fault

        failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf
        --name=osd.2 --keyring=/var/lib/ceph/osd/ceph-2/keyring osd
        crush create-or-move -- 2 0.02 host=ceph-node1 root=default'

        === mds.ceph-node1 === 

        === mds.ceph-node1 === 

        Stopping Ceph mds.ceph-node1 on ceph-node1...kill 1296...done

        === mds.ceph-node1 === 

        Starting Ceph mds.ceph-node1 on ceph-node1...

        starting mds.ceph-node1 at :/0

        [root@ceph-node1 ~]# service ceph status

        === mon.ceph-node1 === 

        mon.ceph-node1: running {"version":"0.80.7"}

        === osd.2 === 

        osd.2: not running.

        === mds.ceph-node1 === 

        mds.ceph-node1: running {"version":"0.80.7"}

        
        The worst part I think is this one:

        failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf
        --name=osd.2 --keyring=/var/lib/ceph/osd/ceph-2/keyring osd
        crush create-or-move -- 2 0.02 host=ceph-node1 root=default'

        
        The osd is not starting...

        Cheers.

        
            Luca

        
        On 11/13/2014 06:33 PM, Luca Mazzaferro wrote: 

      
        Hi,

          thank you for your answer:

          
          On 11/13/2014 06:17 PM, Gregory Farnum wrote:

        
        What does "ceph -s" output when things are
          working?

          
          Does the ceph.conf on your admin node 

        
        BEFORE the problem: from ceph -w because I don't have ceph -s

        
        [rzgceph@admin-node my-cluster]$ ceph -w

            cluster 6fa39bb3-de2d-4ec5-9a86-9d96231d8b5b

             health HEALTH_OK

             monmap e3: 3 mons at
        {ceph-node1=192.168.122.21:6789/0,ceph-node2=192.168.122.22:6789/0,ceph-node3=192.168.122.23:6789/0},


        election epoch 6, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3

             mdsmap e4: 1/1/1 up {0=ceph-node1=up:active}

             osdmap e13: 3 osds: 3 up, 3 in

              pgmap v26: 192 pgs, 3 pools, 1889 bytes data, 21 objects

                    103 MB used, 76655 MB / 76759 MB avail

                         192 active+clean

        
        2014-11-13 17:08:43.240961 mon.0 [INF] pgmap v26: 192 pgs: 192
        active+clean; 1889 bytes data, 103 MB used, 76655 MB / 76759 MB
        avail; 8 B/s wr, 0 op/s

        
        contain the address of each monitor? (Paste is the
          relevant lines.) it will need to or the ceph tool won't be
          able to find the monitors even though the system is working.

        
        No only the initial one... but the documentation doesn't say it,
        but it is reasonable.

        I added the other two. This is my ceph.conf:

        
        [global]

        auth_service_required = cephx

        filestore_xattr_use_omap = true

        auth_client_required = cephx

        auth_cluster_required = cephx

        mon_host = 192.168.122.21 192.68.122.22 192.168.122.23

        mon_initial_members = ceph-node1

        fsid = 6fa39bb3-de2d-4ec5-9a86-9d96231d8b5b

        osd pool default size = 2

        public network = 192.168.0.0/16

        
        and then:

        ceph-deploy --overwrite-conf admin admin-node ceph-node1
        ceph-node2 ceph-node3

        
        and now:

        2014-11-13 18:24:57.522590 7fa4282d1700  0 --
        192.168.122.11:0/1003667 >> 192.168.122.23:6789/0
        pipe(0x7fa418001d40 sd=3 :0 s=1 pgs=0 cs=0 l=1
        c=0x7fa418001fb0).fault

        2014-11-13 18:25:06.524145 7fa4283d2700  0 --
        192.168.122.11:0/1003667 >> 192.168.122.23:6789/0
        pipe(0x7fa418002fa0 sd=3 :0 s=1 pgs=0 cs=0 l=1
        c=0x7fa418003210).fault

        2014-11-13 18:25:12.525096 7fa4283d2700  0 --
        192.168.122.11:0/1003667 >> 192.168.122.23:6789/0
        pipe(0x7fa418003bf0 sd=4 :0 s=1 pgs=0 cs=0 l=1
        c=0x7fa418003e60).fault

        2014-11-13 18:25:21.526622 7fa4282d1700  0 --
        192.168.122.11:0/1003667 >> 192.168.122.23:6789/0
        pipe(0x7fa4180085a0 sd=3 :0 s=1 pgs=0 cs=0 l=1
        c=0x7fa418008810).fault

        2014-11-13 18:25:33.528831 7fa4284d3700  0 --
        192.168.122.11:0/1003667 >> 192.168.122.23:6789/0
        pipe(0x7fa4180085a0 sd=3 :0 s=1 pgs=0 cs=0 l=1
        c=0x7fa418008810).fault

        2014-11-13 18:25:42.530185 7fa4284d3700  0 --
        192.168.122.11:0/1003667 >> 192.168.122.23:6789/0
        pipe(0x7fa418009740 sd=3 :0 s=1 pgs=0 cs=0 l=1
        c=0x7fa4180099b0).fault

        2014-11-13 18:25:51.531688 7fa4283d2700  0 --
        192.168.122.11:0/1003667 >> 192.168.122.23:6789/0
        pipe(0x7fa41800a330 sd=4 :0 s=1 pgs=0 cs=0 l=1
        c=0x7fa41800a5a0).fault

        2014-11-13 18:26:09.534223 7fa4284d3700  0 --
        192.168.122.11:0/1003667 >> 192.168.122.23:6789/0
        pipe(0x7fa41800d550 sd=3 :0 s=1 pgs=0 cs=0 l=1
        c=0x7fa41800e6b0).fault

        
        Better, someone (ceph-node3) answers but not in the right way I
        see.

        
             Luca

        
        -Greg

          On Thu, Nov 13, 2014 at 9:11 AM Luca
            Mazzaferro <luca.mazzaferro@xxxxxxxxxx>


            wrote:

            
                Hi,
              
              
                  On 11/13/2014 06:05 PM, Artem Silenkov wrote:

                
                  Hello! 
                    

                    Only 1 monitor instance? It won't work at most
                      cases. 
                    Make more and ensure quorum to reach
                      survivalability. 
                    

               No, three monitor
                instances, one for each ceph-node. As designed into the

                quick-ceph-deploy.

                
                I tried to kill one of them (the initial monitor) to see
                what happens and happens that.

                :-(

                Ciaoo
              

                    Luca 

              
                    Regards, Silenkov Artem 
                    ---
                    artem.silenkov@xxxxxxxxx
                  
                  
                    2014-11-13 20:02 GMT+03:00
                      Luca Mazzaferro <luca.mazzaferro@xxxxxxxxxx>:

                      Dear Users,

                        I followed the instruction of the storage
                        cluster quick start here:

                        
                        http://ceph.com/docs/master/start/quick-ceph-deploy/

                        
                        I simulate a little storage with 4 VMs
                        ceph-node[1,2,3] and an admin-node.

                        Everything worked fine until I shut down the
                        initial monitor node (ceph-node1).

                        
                        Also with the other monitors on.

                        
                        I restart the ceph-node1 but the ceph command
                        (running from ceph-admin) fails after hanging
                        for 5 minutes.

                        with this exit code:

                        2014-11-13 17:33:31.711410 7f6a5b1af700  0
                        monclient(hunting): authenticate timed out after
                        300

                        2014-11-13 17:33:31.711522 7f6a5b1af700  0
                        librados: client.admin authentication error
                        (110) Connection timed out

                        
                        If I go to the ceph-node1 and restart the
                        services:

                        
                        [root@ceph-node1 ~]# service ceph status

                        === mon.ceph-node1 ===

                        mon.ceph-node1: running {"version":"0.80.7"}

                        === osd.2 ===

                        osd.2: not running.

                        === mds.ceph-node1 ===

                        mds.ceph-node1: running {"version":"0.80.7"}

                        [root@ceph-node1 ~]# service ceph status

                        === mon.ceph-node1 ===

                        mon.ceph-node1: running {"version":"0.80.7"}

                        === osd.2 ===

                        osd.2: not running.

                        === mds.ceph-node1 ===

                        mds.ceph-node1: running {"version":"0.80.7"}

                        
                        I don't understand how to properly restart a
                        node.

                        Can anyone help me?

                        Thank you.

                        Cheers.

                        
                            Luca

                        
                        _______________________________________________

                        ceph-users mailing list

                        ceph-users@xxxxxxxxxxxxxx

                        http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                      
              _______________________________________________

              ceph-users mailing list

              ceph-users@xxxxxxxxxxxxxx

              http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

            
        _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

      
      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com