Re: Node crash, filesytem not usable

Webert de Souza Lima <webert.boss@xxxxxxxxx> · Fri, 11 May 2018 15:33:38 -0300

This message seems to be very concerning: >            mds0: Metadata damage detected

but for the rest, the cluster seems still to be recovering. you could try to seep thing up with ceph tell, like:

ceph tell osd.* injectargs --osd_max_backfills=10
ceph tell osd.* injectargs --osd_recovery_sleep=0.0
ceph tell osd.* injectargs --osd_recovery_threads=2

Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
Belo Horizonte - Brasil
IRC NICK - WebertRLZ

On Fri, May 11, 2018 at 3:06 PM Daniel Davidson <danield@xxxxxxxxxxxxxxxx> wrote:

    Below id the information you were
      asking for.  I think they are size=2, min size=1. 

      Dan

      # ceph status

          cluster
7bffce86-9d7b-4bdf-a9c9-67670e68ca77                                                                                                                                                                               

           health
HEALTH_ERR                                                                                                                                                                                                         

                  140 pgs are stuck inactive for more than 300 seconds

                  64 pgs backfill_wait

                  76 pgs backfilling

                  140 pgs degraded

                  140 pgs stuck degraded

                  140 pgs stuck inactive

                  140 pgs stuck unclean

                  140 pgs stuck undersized

                  140 pgs undersized

                  210 requests are blocked > 32 sec

                  recovery 38725029/695508092 objects degraded (5.568%)

                  recovery 10844554/695508092 objects misplaced (1.559%)

                  mds0: Metadata damage detected

                  mds0: Behind on trimming (71/30)

                  noscrub,nodeep-scrub flag(s) set

           monmap e3: 4 mons at
{ceph-0=172.16.31.1:6789/0,ceph-1=172.16.31.2:6789/0,ceph-2=172.16.31.3:6789/0,ceph-3=172.16.31.4:6789/0}

                  election epoch 824, quorum 0,1,2,3
      ceph-0,ceph-1,ceph-2,ceph-3

            fsmap e144928: 1/1/1 up {0=ceph-0=up:active}, 1 up:standby

           osdmap e35814: 32 osds: 30 up, 30 in; 140 remapped pgs

                  flags
      noscrub,nodeep-scrub,sortbitwise,require_jewel_osds

            pgmap v43142427: 1536 pgs, 2 pools, 762 TB data, 331
      Mobjects

                  1444 TB used, 1011 TB / 2455 TB avail

                  38725029/695508092 objects degraded (5.568%)

                  10844554/695508092 objects misplaced (1.559%)

                      1396 active+clean

                        76
      undersized+degraded+remapped+backfilling+peered

                        64
      undersized+degraded+remapped+wait_backfill+peered

      recovery io 1244 MB/s, 1612 keys/s, 705 objects/s

      ID  WEIGHT     TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY

       -1 2619.54541 root default                                      

       -2  163.72159     host ceph-0                                   

        0   81.86079         osd.0         up  1.00000          1.00000

        1   81.86079         osd.1         up  1.00000          1.00000

       -3  163.72159     host ceph-1                                   

        2   81.86079         osd.2         up  1.00000          1.00000

        3   81.86079         osd.3         up  1.00000          1.00000

       -4  163.72159     host ceph-2                                   

        8   81.86079         osd.8         up  1.00000          1.00000

        9   81.86079         osd.9         up  1.00000          1.00000

       -5  163.72159     host ceph-3                                   

       10   81.86079         osd.10        up  1.00000          1.00000

       11   81.86079         osd.11        up  1.00000          1.00000

       -6  163.72159     host ceph-4                                   

        4   81.86079         osd.4         up  1.00000          1.00000

        5   81.86079         osd.5         up  1.00000          1.00000

       -7  163.72159     host ceph-5                                   

        6   81.86079         osd.6         up  1.00000          1.00000

        7   81.86079         osd.7         up  1.00000          1.00000

       -8  163.72159     host ceph-6                                   

       12   81.86079         osd.12        up  0.79999          1.00000

       13   81.86079         osd.13        up  1.00000          1.00000

       -9  163.72159     host ceph-7                                   

       14   81.86079         osd.14        up  1.00000          1.00000

       15   81.86079         osd.15        up  1.00000          1.00000

      -10  163.72159     host ceph-8                                   

       16   81.86079         osd.16        up  1.00000          1.00000

       17   81.86079         osd.17        up  1.00000          1.00000

      -11  163.72159     host ceph-9                                   

       18   81.86079         osd.18        up  1.00000          1.00000

       19   81.86079         osd.19        up  1.00000          1.00000

      -12  163.72159     host ceph-10                                  

       20   81.86079         osd.20        up  1.00000          1.00000

       21   81.86079         osd.21        up  1.00000          1.00000

      -13  163.72159     host ceph-11                                  

       22   81.86079         osd.22        up  1.00000          1.00000

       23   81.86079         osd.23        up  1.00000          1.00000

      -14  163.72159     host ceph-12                                  

       24   81.86079         osd.24        up  1.00000          1.00000

       25   81.86079         osd.25        up  1.00000          1.00000

      -15  163.72159     host ceph-13                                  

       26   81.86079         osd.26      down        0          1.00000

       27   81.86079         osd.27      down        0          1.00000

      -16  163.72159     host ceph-14                                  

       28   81.86079         osd.28        up  1.00000          1.00000

       29   81.86079         osd.29        up  1.00000          1.00000

      -17  163.72159     host ceph-15                                  

       30   81.86079         osd.30        up  1.00000          1.00000

       31   81.86079         osd.31        up  1.00000          1.00000

      On 05/11/2018 11:56 AM, David Turner wrote:

      What are some outputs of commands to show us the
        state of your cluster.  Most notable is `ceph status` but `ceph
        osd tree` would be helpful. What are the size of the pools in
        your cluster?  Are they all size=3 min_size=2?

        On Fri, May 11, 2018 at 12:05 PM Daniel Davidson
          <danield@xxxxxxxxxxxxxxxx>
          wrote:

        Hello,

          Today we had a node crash, and looking at it, it seems there
          is a 

          problem with the RAID controller, so it is not coming back up,
          maybe 

          ever.  It corrupted the local filesytem for the ceph storage
          there.

          The remainder of our storage (10.2.10) cluster is running, and
          it looks 

          to be repairing and our min_size is set to 2.  Normally, I
          would expect 

          that the system would keep running normally from and end user

          perspective when this happens, but the system is down. All
          mounts that 

          were up when this started look to be stale, and new mounts
          give the 

          following error:

          # mount -t ceph ceph-0:/ /test/ -o 

name=admin,secretfile=/etc/ceph/admin.secret,noatime,_netdev,rbytes

          mount error 5 = Input/output error

          Any suggestions?

          Dan

          _______________________________________________

          ceph-users mailing list

          ceph-users@xxxxxxxxxxxxxx

          http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com