Re: Negative amount of objects degraded

Luis Periquito <periquito@xxxxxxxxx> · Fri, 31 Oct 2014 08:55:21 +0000

I've had the same issue before during a cluster rebalancing and after restarting one of the daemons (can't remember now if it was one of the OSDs or MONs) the values reset to a more sane value and the cluster eventually recovered when it reached 0 objects degraded.

Additionally when you have a big number of objects to recover the ceph osd pool stats will print a negative number of objects to recover and/or a total negative number of objects.

On Thu, Oct 30, 2014 at 10:14 PM, Mike Dawson <mike.dawson@xxxxxxxxxxxx> wrote:
Erik,

I reported a similar issue 22 months ago. I don't think any developer has ever really prioritized these issues.

http://tracker.ceph.com/issues/3720

I was able to recover that cluster. The method I used is in the comments. I have no idea if my cluster was broken for the same reason as your. Your results may vary.

- Mike Dawson

On 10/30/2014 4:50 PM, Erik Logtenberg wrote:

Thanks for pointing that out. Unfortunately, those tickets contain only

a description of the problem, but no solution or workaround. One was

opened 8 months ago and the other more than a year ago. No love since.

Is there any way I can get my cluster back in a healthy state?

Thanks,

Erik.

On 10/30/2014 05:13 PM, John Spray wrote:

There are a couple of open tickets about bogus (negative) stats on PGs:

http://tracker.ceph.com/issues/5884

http://tracker.ceph.com/issues/7737

Cheers,

John

On Thu, Oct 30, 2014 at 12:38 PM, Erik Logtenberg <erik@xxxxxxxxxxxxx> wrote:

Hi,

Yesterday I removed two OSD's, to replace them with new disks. Ceph was

not able to completely reach all active+clean state, but some degraded

objects remain. However, the amount of degraded objects is negative

(-82), see below:

2014-10-30 13:31:32.862083 mon.0 [INF] pgmap v209175: 768 pgs: 761

active+clean, 7 active+remapped; 1644 GB data, 2524 GB used, 17210 GB /

19755 GB avail; 2799 B/s wr, 1 op/s; -82/1439391 objects degraded (-0.006%)

According to "rados df", the -82 degraded objects are part of the

cephfs-data-cache pool, which is an SSD-backed replicated pool, that

functions as a cache pool for an HDD-backed erasure coded pool for cephfs.

The cache should be empty, because I isseud "rados

cache-flush-evict-all"-command, and "rados -p cephfs-data-cache ls"

indeed shows zero objects in this pool.

"rados df" however does show 192 objects for this pool, with just 35KB

used and -82 degraded:

pool name       category                 KB      objects       clones

   degraded      unfound           rd        rd KB           wr        wr KB

cephfs-data-cache -                         35          192            0

          -82           0         1119       348800      1198371   1703673493

Please advice...

Thanks,

Erik.

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com