ceph mons de-synced from rest of cluster?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



All,

Recently doubled the number of OSDs in our cluster, and towards the end of the rebalancing, I noticed that recovery IO fell to nothing and that the ceph mons eventually looked like this when I ran ceph -s

      cluster:
        id:     6a65c3d0-b84e-4c89-bbf7-a38a1966d780
        health: HEALTH_WARN
                34922/4329975 objects misplaced (0.807%)
Reduced data availability: 542 pgs inactive, 49 pgs peering, 13502 pgs stale Degraded data redundancy: 248778/4329975 objects degraded (5.745%), 7319 pgs unclean, 2224 pgs degraded, 1817 pgs undersized

      services:
        mon: 3 daemons, quorum cephmon-0,cephmon-1,cephmon-2
        mgr: cephmon-0(active), standbys: cephmon-1, cephmon-2
        osd: 376 osds: 376 up, 376 in

      data:
        pools:   9 pools, 13952 pgs
        objects: 1409k objects, 5992 GB
        usage:   31528 GB used, 1673 TB / 1704 TB avail
        pgs:     3.225% pgs unknown
                 0.659% pgs not active
                 248778/4329975 objects degraded (5.745%)
                 34922/4329975 objects misplaced (0.807%)
                 6141 stale+active+clean
                 4537 stale+active+remapped+backfilling
                 1575 stale+active+undersized+degraded
                 489  stale+active+clean+remapped
                 450  unknown
                 396  stale+active+recovery_wait+degraded
216 stale+active+undersized+degraded+remapped+backfilling
                 40   stale+peering
                 30   stale+activating
                 24   stale+active+undersized+remapped
                 22   stale+active+recovering+degraded
                 13   stale+activating+degraded
                 9    stale+remapped+peering
                 4    stale+active+remapped+backfill_wait
                 3    stale+active+clean+scrubbing+deep
2 stale+active+undersized+degraded+remapped+backfill_wait
                 1    stale+active+remapped

The problem is, everything works fine. If I run ceph health detail and do a pg query against one of the 'degraded' placement groups, it reports back as active-clean. All clients in the cluster can write and read at normal speeds, but not IO information is ever reported in ceph -s.

From what I can see, everything in the cluster is working properly except the actual reporting on the status of the cluster. Has anyone seen this before/know how to sync the mons up to what the OSDs are actually reporting? I see no connectivity errors in the logs of the mons or the osds.

Thanks,

---
v/r

Chris Apsey
bitskrieg@xxxxxxxxxxxxx
https://www.bitskrieg.net
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux