On Sun, Feb 11, 2018 at 8:19 PM Chris Apsey <bitskrieg@xxxxxxxxxxxxx> wrote:
All,
Recently doubled the number of OSDs in our cluster, and towards the end
of the rebalancing, I noticed that recovery IO fell to nothing and that
the ceph mons eventually looked like this when I ran ceph -s
cluster:
id: 6a65c3d0-b84e-4c89-bbf7-a38a1966d780
health: HEALTH_WARN
34922/4329975 objects misplaced (0.807%)
Reduced data availability: 542 pgs inactive, 49 pgs
peering, 13502 pgs stale
Degraded data redundancy: 248778/4329975 objects
degraded (5.745%), 7319 pgs unclean, 2224 pgs degraded, 1817 pgs
undersized
services:
mon: 3 daemons, quorum cephmon-0,cephmon-1,cephmon-2
mgr: cephmon-0(active), standbys: cephmon-1, cephmon-2
osd: 376 osds: 376 up, 376 in
data:
pools: 9 pools, 13952 pgs
objects: 1409k objects, 5992 GB
usage: 31528 GB used, 1673 TB / 1704 TB avail
pgs: 3.225% pgs unknown
0.659% pgs not active
248778/4329975 objects degraded (5.745%)
34922/4329975 objects misplaced (0.807%)
6141 stale+active+clean
4537 stale+active+remapped+backfilling
1575 stale+active+undersized+degraded
489 stale+active+clean+remapped
450 unknown
396 stale+active+recovery_wait+degraded
216
stale+active+undersized+degraded+remapped+backfilling
40 stale+peering
30 stale+activating
24 stale+active+undersized+remapped
22 stale+active+recovering+degraded
13 stale+activating+degraded
9 stale+remapped+peering
4 stale+active+remapped+backfill_wait
3 stale+active+clean+scrubbing+deep
2
stale+active+undersized+degraded+remapped+backfill_wait
1 stale+active+remapped
The problem is, everything works fine. If I run ceph health detail and
do a pg query against one of the 'degraded' placement groups, it reports
back as active-clean. All clients in the cluster can write and read at
normal speeds, but not IO information is ever reported in ceph -s.
From what I can see, everything in the cluster is working properly
except the actual reporting on the status of the cluster. Has anyone
seen this before/know how to sync the mons up to what the OSDs are
actually reporting? I see no connectivity errors in the logs of the
mons or the osds.
It sounds like the manager has gone stale somehow. You can probably fix it by restarting, though if you have logs it would be good to file a bug report at tracker.ceph.com.
-Greg
Thanks,
---
v/r
Chris Apsey
bitskrieg@xxxxxxxxxxxxx
https://www.bitskrieg.net
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com