Re: recovery_deletes flag in OSDMap

Wido den Hollander <wido@xxxxxxxx> · Wed, 21 Feb 2018 10:11:40 +0100

On 02/20/2018 07:56 PM, Sage Weil wrote:
On Tue, 20 Feb 2018, Wido den Hollander wrote:
On 02/20/2018 03:05 PM, Dan van der Ster wrote:
Hi Wido,

When you finish updating all osds in a cluster to luminous, the last step:

    ceph osd require-osd-release luminous

actually sets the recovery_deletes flag.

All our luminous clusters have this enabled:

    # ceph osd dump | grep recovery
    flags sortbitwise,recovery_deletes

Yes, I noticed.

And that super secret redhat link explains that recovery_deletes
allows deletes to take place during recovery instead of at peering
time, which was previously the case.

Ok! The source told me that as well, but can somebody tell me the exact
benefit of this?

Does it improve/smoothen the peering process?

Yes!

I heard rumors that it makes peering block less, but I'm not sure. I like to
hear facts or experiences :)

We saw this in the sepia lab cluster, which has a big CephFS file system
that archives all of our test results.  Lots of data injested
continuously, and some cron jobs to clean up old test results that
are passes or very old.  When the delete jobs are running, the pg logs
end up with lots of delete entries.  If you have an OSD go down and
then come back up having missed some of those deletes, all of the
deletes objects in the log would be synchronously deleted in order
for peering to progress, blocking IO and making PGs appear stuck in
'peering' state.  This change fixes that: peering completes immediately,
and the deletes are done asynchronously, just like modified or new
objects would during recovery.

Ok, that explains! Thanks! Nothing was do be found about this flag. 
Search engines should pick this up now and other people can find it as 
well :)

Wido

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html