Re: Clarification on sequence of recovery and client ops after OSDs rejoin cluster (also, slow requests)

Josh Durgin <jdurgin@xxxxxxxxxx> · Wed, 13 Sep 2017 17:47:26 -0700

On 09/13/2017 03:40 AM, Florian Haas wrote:
So we have a client that is talking to OSD 30. OSD 30 was never down;
OSD 17 was. OSD 30 is also the preferred primary for this PG (via
primary affinity). The OSD now says that

- it does itself have a copy of the object,
- so does OSD 94,
- but that the object is "also" missing on OSD 17.

So I'd like to ask firstly: what does "also" mean here?

Nothing, it's just included in all the log messages in the loop looking
at whether objects are missing.

Secondly, if the local copy is current, and we have no fewer than
min_size objects, and recovery is meant to be a background operation,
then why is the recovery in the I/O path here? Specifically, why is
that the case on a write, where the object is being modified anyway,
and the modification then needs to be replicated out to OSDs 17 and
94?

Mainly because recovery pre-dated the concept of min_size. We realized
this was a problem during luminous development, but did not complete the
fix for it in time for luminous. Nice analysis of the issue though!

I'm working on the fix (aka async recovery) for mimic. This won't be 
backportable unfortunately.

Josh
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com