Re: How many nodes/OSD can fail

Willi Fehler <willi.fehler@xxxxxxxxxxx> · Sun, 3 Jul 2016 09:58:02 +0200



    Hello Tu,

    
    yes that's correct. The mon nodes run as well on the OSD nodes. So I
    have

    
    3 nodes in total. OSD, MDS and Mon on each Node.

    
    Regards - Willi

    
    Am 03.07.16 um 09:56 schrieb Tu Holmes:

    
      Where are your mon nodes?
      Were you mixing mon and OSD together?
      Are 2 of the mon nodes down as well?
      On Jul 3, 2016 12:53 AM, "Willi Fehler"
        <willi.fehler@xxxxxxxxxxx>
        wrote:

        
           Hello Sean,

            
            I've powered down 2 nodes. So 6 of 9 OSD are down. But my
            client can't write and read anymore from my Ceph mount. Also
            'ceph -s' hangs.

            
            pool 1 'cephfs_data' replicated size 3 min_size 1
            crush_ruleset 0 object_hash rjenkins pg_num 300 pgp_num 300
            last_change 447 flags hashpspool crash_replay_interval 45
            stripe_width 0

            pool 2 'cephfs_metadata' replicated size 3 min_size 1
            crush_ruleset 0 object_hash rjenkins pg_num 300 pgp_num 300
            last_change 445 flags hashpspool stripe_width 0

            
            2016-07-03 09:49:40.695953 7f3da56f9700  0 -- 192.168.0.5:0/2773396901
            >> 192.168.0.7:6789/0
            pipe(0x7f3da0001f50 sd=3 :0 s=1 pgs=0 cs=0 l=1
            c=0x7f3da0000f20).fault

            2016-07-03 09:49:44.195029 7f3da57fa700  0 -- 192.168.0.5:0/2773396901
            >> 192.168.0.6:6789/0
            pipe(0x7f3da0005500 sd=4 :0 s=1 pgs=0 cs=0 l=1
            c=0x7f3da00067c0).fault

            2016-07-03 09:49:50.205788 7f3da55f8700  0 -- 192.168.0.5:0/2773396901
            >> 192.168.0.6:6789/0
            pipe(0x7f3da0005500 sd=3 :0 s=1 pgs=0 cs=0 l=1
            c=0x7f3da0004c40).fault

            2016-07-03 09:49:52.720116 7f3da57fa700  0 -- 192.168.0.5:0/2773396901
            >> 192.168.0.7:6789/0
            pipe(0x7f3da00023f0 sd=4 :0 s=1 pgs=0 cs=0 l=1
            c=0x7f3da00036b0).fault

            
            Regards - Willi

            
            Am 03.07.16 um 09:36 schrieb Sean Redmond:

            
              It would need to be set to 1 
              On 3 Jul 2016 8:17 a.m., "Willi
                Fehler" <willi.fehler@xxxxxxxxxxx>

                wrote:

                
                   Hello David,

                    
                    so in a 3 node Cluster how should I set min_size if
                    I want that 2 nodes could fail?

                    
                    Regards - Willi

                    
                    Am 28.06.16 um 13:07 schrieb David:

                    
                      Hi,
                        

                        This is probably the min_size on your
                          cephfs data and/or metadata pool. I believe
                          the default is 2, if you have less than 2
                          replicas available I/O will stop. See: http://docs.ceph.com/docs/master/rados/operations/pools/#set-the-number-of-object-replicas
                      
                      
                        On Tue, Jun 28, 2016 at
                          10:23 AM, willi.fehler@xxxxxxxxxxx
                          <willi.fehler@xxxxxxxxxxx>
                          wrote:

                          
                              Hello,

                                  
                                  I'm still very new to Ceph. I've
                                  created a small test Cluster.
                               
                              ceph-node1
                              osd0
                              osd1
                              osd2
                              ceph-node2
                              osd3
                              osd4
                              osd5
                              ceph-node3
                              osd6
                              osd7
                              osd8
                               
                              My


                                  pool for CephFS has a replication
                                  count of 3. I've powered of 2 nodes(6
                                  OSDs went down) and my cluster status
                                  became critical and my ceph
                                  clients(cephfs) run into a timeout. My
                                  data(I had only one file on my pool)
                                  was still on one of the active OSDs.
                                  Is this the expected behaviour that
                                  the Cluster status became critical and
                                  my Clients run into a timeout?
                               
                              Many


                                  thanks for your feedback.
                               
                              Regards


                                  - Willi
                               
                              
_______________________________________________

                            ceph-users mailing list

                            ceph-users@xxxxxxxxxxxxxx

                            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                            
                  _______________________________________________

                  ceph-users mailing list

                  ceph-users@xxxxxxxxxxxxxx

                  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                  
          _______________________________________________

          ceph-users mailing list

          ceph-users@xxxxxxxxxxxxxx

          http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

          
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com