Re: Node crash, filesytem not usable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Below id the information you were asking for.  I think they are size=2, min size=1.

Dan

# ceph status
    cluster 7bffce86-9d7b-4bdf-a9c9-67670e68ca77                                                                                                                                                                               
     health HEALTH_ERR                                                                                                                                                                                                         
            140 pgs are stuck inactive for more than 300 seconds
            64 pgs backfill_wait
            76 pgs backfilling
            140 pgs degraded
            140 pgs stuck degraded
            140 pgs stuck inactive
            140 pgs stuck unclean
            140 pgs stuck undersized
            140 pgs undersized
            210 requests are blocked > 32 sec
            recovery 38725029/695508092 objects degraded (5.568%)
            recovery 10844554/695508092 objects misplaced (1.559%)
            mds0: Metadata damage detected
            mds0: Behind on trimming (71/30)
            noscrub,nodeep-scrub flag(s) set
     monmap e3: 4 mons at {ceph-0=172.16.31.1:6789/0,ceph-1=172.16.31.2:6789/0,ceph-2=172.16.31.3:6789/0,ceph-3=172.16.31.4:6789/0}
            election epoch 824, quorum 0,1,2,3 ceph-0,ceph-1,ceph-2,ceph-3
      fsmap e144928: 1/1/1 up {0=ceph-0=up:active}, 1 up:standby
     osdmap e35814: 32 osds: 30 up, 30 in; 140 remapped pgs
            flags noscrub,nodeep-scrub,sortbitwise,require_jewel_osds
      pgmap v43142427: 1536 pgs, 2 pools, 762 TB data, 331 Mobjects
            1444 TB used, 1011 TB / 2455 TB avail
            38725029/695508092 objects degraded (5.568%)
            10844554/695508092 objects misplaced (1.559%)
                1396 active+clean
                  76 undersized+degraded+remapped+backfilling+peered
                  64 undersized+degraded+remapped+wait_backfill+peered
recovery io 1244 MB/s, 1612 keys/s, 705 objects/s

ID  WEIGHT     TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY
 -1 2619.54541 root default                                      
 -2  163.72159     host ceph-0                                   
  0   81.86079         osd.0         up  1.00000          1.00000
  1   81.86079         osd.1         up  1.00000          1.00000
 -3  163.72159     host ceph-1                                   
  2   81.86079         osd.2         up  1.00000          1.00000
  3   81.86079         osd.3         up  1.00000          1.00000
 -4  163.72159     host ceph-2                                   
  8   81.86079         osd.8         up  1.00000          1.00000
  9   81.86079         osd.9         up  1.00000          1.00000
 -5  163.72159     host ceph-3                                   
 10   81.86079         osd.10        up  1.00000          1.00000
 11   81.86079         osd.11        up  1.00000          1.00000
 -6  163.72159     host ceph-4                                   
  4   81.86079         osd.4         up  1.00000          1.00000
  5   81.86079         osd.5         up  1.00000          1.00000
 -7  163.72159     host ceph-5                                   
  6   81.86079         osd.6         up  1.00000          1.00000
  7   81.86079         osd.7         up  1.00000          1.00000
 -8  163.72159     host ceph-6                                   
 12   81.86079         osd.12        up  0.79999          1.00000
 13   81.86079         osd.13        up  1.00000          1.00000
 -9  163.72159     host ceph-7                                   
 14   81.86079         osd.14        up  1.00000          1.00000
 15   81.86079         osd.15        up  1.00000          1.00000
-10  163.72159     host ceph-8                                   
 16   81.86079         osd.16        up  1.00000          1.00000
 17   81.86079         osd.17        up  1.00000          1.00000
-11  163.72159     host ceph-9                                   
 18   81.86079         osd.18        up  1.00000          1.00000
 19   81.86079         osd.19        up  1.00000          1.00000
-12  163.72159     host ceph-10                                  
 20   81.86079         osd.20        up  1.00000          1.00000
 21   81.86079         osd.21        up  1.00000          1.00000
-13  163.72159     host ceph-11                                  
 22   81.86079         osd.22        up  1.00000          1.00000
 23   81.86079         osd.23        up  1.00000          1.00000
-14  163.72159     host ceph-12                                  
 24   81.86079         osd.24        up  1.00000          1.00000
 25   81.86079         osd.25        up  1.00000          1.00000
-15  163.72159     host ceph-13                                  
 26   81.86079         osd.26      down        0          1.00000
 27   81.86079         osd.27      down        0          1.00000
-16  163.72159     host ceph-14                                  
 28   81.86079         osd.28        up  1.00000          1.00000
 29   81.86079         osd.29        up  1.00000          1.00000
-17  163.72159     host ceph-15                                  
 30   81.86079         osd.30        up  1.00000          1.00000
 31   81.86079         osd.31        up  1.00000          1.00000



On 05/11/2018 11:56 AM, David Turner wrote:
What are some outputs of commands to show us the state of your cluster.  Most notable is `ceph status` but `ceph osd tree` would be helpful. What are the size of the pools in your cluster?  Are they all size=3 min_size=2?

On Fri, May 11, 2018 at 12:05 PM Daniel Davidson <danield@xxxxxxxxxxxxxxxx> wrote:
Hello,

Today we had a node crash, and looking at it, it seems there is a
problem with the RAID controller, so it is not coming back up, maybe
ever.  It corrupted the local filesytem for the ceph storage there.

The remainder of our storage (10.2.10) cluster is running, and it looks
to be repairing and our min_size is set to 2.  Normally, I would expect
that the system would keep running normally from and end user
perspective when this happens, but the system is down. All mounts that
were up when this started look to be stale, and new mounts give the
following error:

# mount -t ceph ceph-0:/ /test/ -o
name=admin,secretfile=/etc/ceph/admin.secret,noatime,_netdev,rbytes
mount error 5 = Input/output error

Any suggestions?

Dan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux