I've had the same issue before during a cluster rebalancing and after restarting one of the daemons (can't remember now if it was one of the OSDs or MONs) the values reset to a more sane value and the cluster eventually recovered when it reached 0 objects degraded.
Additionally when you have a big number of objects to recover the ceph osd pool stats will print a negative number of objects to recover and/or a total negative number of objects.On Thu, Oct 30, 2014 at 10:14 PM, Mike Dawson <mike.dawson@xxxxxxxxxxxx> wrote:
Erik,
I reported a similar issue 22 months ago. I don't think any developer has ever really prioritized these issues.
http://tracker.ceph.com/issues/3720
I was able to recover that cluster. The method I used is in the comments. I have no idea if my cluster was broken for the same reason as your. Your results may vary.
- Mike Dawson
On 10/30/2014 4:50 PM, Erik Logtenberg wrote:
Thanks for pointing that out. Unfortunately, those tickets contain only_______________________________________________
a description of the problem, but no solution or workaround. One was
opened 8 months ago and the other more than a year ago. No love since.
Is there any way I can get my cluster back in a healthy state?
Thanks,
Erik.
On 10/30/2014 05:13 PM, John Spray wrote:
There are a couple of open tickets about bogus (negative) stats on PGs:_______________________________________________
http://tracker.ceph.com/issues/5884
http://tracker.ceph.com/issues/7737
Cheers,
John
On Thu, Oct 30, 2014 at 12:38 PM, Erik Logtenberg <erik@xxxxxxxxxxxxx> wrote:
Hi,
Yesterday I removed two OSD's, to replace them with new disks. Ceph was
not able to completely reach all active+clean state, but some degraded
objects remain. However, the amount of degraded objects is negative
(-82), see below:
2014-10-30 13:31:32.862083 mon.0 [INF] pgmap v209175: 768 pgs: 761
active+clean, 7 active+remapped; 1644 GB data, 2524 GB used, 17210 GB /
19755 GB avail; 2799 B/s wr, 1 op/s; -82/1439391 objects degraded (-0.006%)
According to "rados df", the -82 degraded objects are part of the
cephfs-data-cache pool, which is an SSD-backed replicated pool, that
functions as a cache pool for an HDD-backed erasure coded pool for cephfs.
The cache should be empty, because I isseud "rados
cache-flush-evict-all"-command, and "rados -p cephfs-data-cache ls"
indeed shows zero objects in this pool.
"rados df" however does show 192 objects for this pool, with just 35KB
used and -82 degraded:
pool name category KB objects clones
degraded unfound rd rd KB wr wr KB
cephfs-data-cache - 35 192 0
-82 0 1119 348800 1198371 1703673493
Please advice...
Thanks,
Erik.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com