Re: MDS damaged

Daniel Davidson <danield@xxxxxxxxxxxxxxxx> · Tue, 24 Oct 2017 21:25:26 -0500



    Out of desperation, I started with the
      disaster recovery guide:

      
      http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/

      
      After exporting the journal, I started doing:

      
      cephfs-journal-tool
      event recover_dentries
      summary

      
      And that was about 7 hours ago, and it is still
        running.  I am getting a lot of messages like:

        
        2017-10-24 21:24:10.910489 7f775e539bc0  1 scavenge_dentries:
        frag 607.00000000 is corrupt, overwriting

        
        The frag number is the same for every line and there have been
        thousands.

        
        I really could use some assistance,

        
        Dan

        
      On 10/24/2017 12:14 PM, Daniel Davidson wrote:

    
    Our
      ceph system is having a problem.
      

      A few days a go we had a pg that was marked as inconsistent, and
      today I fixed it with a:
      

      #ceph pg repair 1.37c
      

      then a file was stuck as missing so I did a:
      

      #ceph pg 1.37c mark_unfound_lost delete
      

      pg has 1 objects unfound and apparently lost marking
      

      That fixed the unfound file problem and all the pgs went
      active+clean.  A few minutes later though, the FS seemed to pause
      and the MDS started giving errors.
      

      # ceph -w
      

          cluster 7bffce86-9d7b-4bdf-a9c9-67670e68ca77
      

           health HEALTH_ERR
      

                  mds rank 0 is damaged
      

                  mds cluster is degraded
      

                  noscrub,nodeep-scrub flag(s) set
      

           monmap e3: 4 mons at
{ceph-0=172.16.31.1:6789/0,ceph-1=172.16.31.2:6789/0,ceph-2=172.16.31.3:6789/0,ceph-3=172.16.31.4:6789/0}

                  election epoch 652, quorum 0,1,2,3
      ceph-0,ceph-1,ceph-2,ceph-3
      

            fsmap e121409: 0/1/1 up, 4 up:standby, 1 damaged
      

           osdmap e35220: 32 osds: 32 up, 32 in
      

                  flags
      noscrub,nodeep-scrub,sortbitwise,require_jewel_osds
      

            pgmap v28398840: 1536 pgs, 2 pools, 795 TB data, 329
      Mobjects
      

                  1595 TB used, 1024 TB / 2619 TB avail
      

                      1536 active+clean
      

      Looking into the logs when I try a:
      

      #ceph mds repaired 0
      

      2017-10-24 12:01:27.354271 mds.0 172.16.31.3:6801/1949050374 75 :
      cluster [ERR] dir 607 object missing on disk; some files may be
      lost (~mds0/stray7)
      

      Any ideas as for what to do next, I am stumped.
      

      Dan
      

      _______________________________________________
      

      ceph-users mailing list
      

      ceph-users@xxxxxxxxxxxxxx
      

      http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
      

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com