Re: objects degraded higher than 100%

David Zafman <dzafman@xxxxxxxxxx> · Fri, 13 Oct 2017 10:53:17 -0700

I improved the code to compute degraded objects during 
backfill/recovery.  During my testing it wouldn't result in percentage 
above 100%.  I'll have to look at the code and verify that some 
subsequent changes didn't break things.

David

On 10/13/17 9:55 AM, Florian Haas wrote:
Okay, in that case I've no idea. What was the timeline for the recovery
versus the rados bench and cleanup versus the degraded object counts,
then?
1. Jewel deployment with filestore.
2. Upgrade to Luminous (including mgr deployment and "ceph osd
require-osd-release luminous"), still on filestore.
3. rados bench with subsequent cleanup.
4. All OSDs up, all  PGs active+clean.
5. Stop one OSD. Remove from CRUSH, auth list, OSD map.
6. Reinitialize OSD with bluestore.
7. Start OSD, commencing backfill.
8. Degraded objects above 100%.

Please let me know if that information is useful. Thank you!

Hmm, that does leave me a little perplexed.
Yeah exactly, me too. :)

David, do we maybe do something with degraded counts based on the number of
objects identified in pg logs? Or some other heuristic for number of objects
that might be stale? That's the only way I can think of to get these weird
returning sets.
One thing that just crossed my mind: would it make a difference
whether after the OSD goes out or not, in the time window between it
going down and being deleted from the crushmap/osdmap? I think it
shouldn't (whether being marked out or just non-existent, it's not
eligible for holding any data so either way), but I'm not really sure
about the mechanics of the internals here.

Cheers,
Florian

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com