Re: Can't mount Cephfs

Goncalo Borges <goncalo@xxxxxxxxxxxxxxxxxxx> · Fri, 28 Aug 2015 13:20:47 +1000



    Hey Andrzej...

    
    As Jan replied, I would first try to recover what I can from the
    ceph cluster. For the time being, I would not be concerned with
    cephfs. 

    
    I would also backup the current OSDs so that, if something goes
    wrong, I can go back to the current state.

    
    The recover of the cluster would consist in understanding which data
    is lost. I never had to do this, but naively, I would:

    0) Stop the mds servers

      1) Try to find which PGs were in the OSDs that failed in different
      hosts (running 'ceph health detail' or 'ceph pg dump')

      2) Mark the failing osds as lost

      3) Once you are sure which PGs are unrecoverable, mark them as
      lost (ceph pg <id> mark_unfound_lost delete)

      4) Remove the osds from the cluster

      5) Give him some time and see if it recovers. The idea is to go
      into a situation where the cluster only complains about mds
      problems. Something likeL

      ceph health detail

        HEALTH_WARN mds cluster is degraded

      
    Now, I think you are at the point where you can try to think of
    recovering the filesystem. I would ask for a new round of help
    suggestion when you reach this stage.

    
    I would also wait for further comments on the procedure above since
    I never tried it myself. Finally, I would also suggest a good look
    to

       
    http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/

    
    Kind Regards

    Goncalo

    
    On 08/27/2015 12:36 AM, Jan Schermer
      wrote:

    
      Most of the data is still here, but you won't be able to just
      "mount" it if it's inconsistent.
      

        I don't use CephFS so someone else could tell you
          if it's able to repair the filesystem with some parts missing.
        

        You lost part of the data where the copies were
          only on the 1 disk in one node and on either of the disks on
          the other node since no other copy exists. How much data you
          lost I don't exactly know, but since you only have 16 OSDs I'm
          afraid it will be in the order of  ~3% probably? How many
          "files" are intact is a different question - it could be that
          every file is missing 3% of contents which would make the loss
          total.
        

        Guys? I have no idea how files map to pgs and
          object in CephFS...
        

        Jan
        

              On 26 Aug 2015, at 14:44, Andrzej Łukawski
                <alukawski@xxxxxxxxxx>
                wrote:
              

                  Thank you for answer. I
                    lost 2 disks on 1st node and 1 disk on 2nd. I
                    understand it is not possible to recover the data
                    even partially? Unfortunatelly those disks are lost
                    forever.

                    
                    Andrzej

                    
                    W dniu 2015-08-26 o 12:26, Jan Schermer pisze:

                  
                    If you lost 3 disks with size 2 and at least 2 of
                    those disks were in different host, that means you
                    lost data with the default CRUSH.
                    There's nothing you can do but either
                      get those disks back in or recover from backup.
                    

                    Jan
                    

                          On 26 Aug 2015, at 12:18,
                            Andrzej Łukawski <alukawski@xxxxxxxxxx>
                            wrote:
                          

                             Hi,

                              
                              We have ceph cluster (Ceph version 0.94.2)
                              which consists of four nodes with four
                              disks on each node. Ceph is configured to
                              hold two replicas (size 2). We use this
                              cluster for ceph filesystem. Few days ago
                              we had power outage after which I had to
                              replace three of our cluster OSD disks.
                              All OSD disks are now online, but I'm
                              unable to mount filesystem and constantly
                              receive 'mount error 5 = Input/output
                              error'.  Ceph status shows many
                              'incomplete' pgs and that 'mds cluster is
                              degraded'. According to 'ceph health
                              detail' mds is replaying journal. 

                              
                              [root@cnode0 ceph]# ceph -s

                                  cluster
                              39c717a3-5e15-4e5e-bc54-7e7f1fd0ee24

                                   health HEALTH_WARN

                                          25 pgs backfill_toofull

                                          10 pgs degraded

                                          126 pgs down

                                          263 pgs incomplete

                                          54 pgs stale

                                          10 pgs stuck degraded

                                          263 pgs stuck inactive

                                          54 pgs stuck stale

                                          289 pgs stuck unclean

                                          10 pgs stuck undersized

                                          10 pgs undersized

                                          4 requests are blocked > 32
                              sec

                                          recovery 27139/10407227
                              objects degraded (0.261%)

                                          recovery 168597/10407227
                              objects misplaced (1.620%)

                                          4 near full osd(s)

                                          too many PGs per OSD (312 >
                              max 300)

                                          mds cluster is
                                degraded

                                   monmap e6: 6 mons at
{0=x.x.70.1:6789/0,0m=x.x.71.1:6789/0,1=x.x.70.2:6789/0,1m=x.x.71.2:6789/0,2=x.x.70.3:6789/0,2m=x.x.71.3:6789/0}

                                          election epoch 2958, quorum
                              0,1,2,3,4,5 0,1,2,0m,1m,2m

                                   mdsmap e1236: 1/1/1 up {0=2=up:replay}, 2 up:standby

                                   osdmap e83705: 16 osds: 16 up, 16 in;
                              26 remapped pgs

                                    pgmap v40869228: 2496 pgs, 3 pools,
                              16952 GB data, 5046 kobjects

                                          32825 GB used, 11698 GB /
                              44524 GB avail

                                          27139/10407227 objects
                              degraded (0.261%)

                                          168597/10407227 objects
                              misplaced (1.620%)

                                              2153 active+clean

                                               137 incomplete

                                               126 down+incomplete

                                                54 stale+active+clean

                                                15
                              active+remapped+backfill_toofull

                                                10
                              active+undersized+degraded+remapped+backfill_toofull

                                                 1 active+remapped

                              [root@cnode0 ceph]#

                              
                              I wasn't able to find any solution in the
                              Internet and I worry I will make things
                              even worse when continue to troubleshoot
                              this on my own. I'm stuck. Could you
                              please help?

                              
                              Thanks.

                              Andrzej

                              
_______________________________________________

                            ceph-users mailing list

                            ceph-users@xxxxxxxxxxxxxx

                            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                          
      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
    -- 
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937

  
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com