Re: objects degraded higher than 100%

Florian Haas <florian@xxxxxxxxxxx> · Fri, 13 Oct 2017 18:55:50 +0200

>> > Okay, in that case I've no idea. What was the timeline for the recovery
>> > versus the rados bench and cleanup versus the degraded object counts,
>> > then?
>>
>> 1. Jewel deployment with filestore.
>> 2. Upgrade to Luminous (including mgr deployment and "ceph osd
>> require-osd-release luminous"), still on filestore.
>> 3. rados bench with subsequent cleanup.
>> 4. All OSDs up, all  PGs active+clean.
>> 5. Stop one OSD. Remove from CRUSH, auth list, OSD map.
>> 6. Reinitialize OSD with bluestore.
>> 7. Start OSD, commencing backfill.
>> 8. Degraded objects above 100%.
>>
>> Please let me know if that information is useful. Thank you!
>
>
> Hmm, that does leave me a little perplexed.

Yeah exactly, me too. :)

> David, do we maybe do something with degraded counts based on the number of
> objects identified in pg logs? Or some other heuristic for number of objects
> that might be stale? That's the only way I can think of to get these weird
> returning sets.

One thing that just crossed my mind: would it make a difference
whether after the OSD goes out or not, in the time window between it
going down and being deleted from the crushmap/osdmap? I think it
shouldn't (whether being marked out or just non-existent, it's not
eligible for holding any data so either way), but I'm not really sure
about the mechanics of the internals here.

Cheers,
Florian
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com