On Tue, 20 Feb 2018, Wido den Hollander wrote: > On 02/20/2018 03:05 PM, Dan van der Ster wrote: > > Hi Wido, > > > > When you finish updating all osds in a cluster to luminous, the last step: > > > > ceph osd require-osd-release luminous > > > > actually sets the recovery_deletes flag. > > > > All our luminous clusters have this enabled: > > > > # ceph osd dump | grep recovery > > flags sortbitwise,recovery_deletes > > > > Yes, I noticed. > > > And that super secret redhat link explains that recovery_deletes > > allows deletes to take place during recovery instead of at peering > > time, which was previously the case. > > > > Ok! The source told me that as well, but can somebody tell me the exact > benefit of this? > > Does it improve/smoothen the peering process? Yes! > I heard rumors that it makes peering block less, but I'm not sure. I like to > hear facts or experiences :) We saw this in the sepia lab cluster, which has a big CephFS file system that archives all of our test results. Lots of data injested continuously, and some cron jobs to clean up old test results that are passes or very old. When the delete jobs are running, the pg logs end up with lots of delete entries. If you have an OSD go down and then come back up having missed some of those deletes, all of the deletes objects in the log would be synchronously deleted in order for peering to progress, blocking IO and making PGs appear stuck in 'peering' state. This change fixes that: peering completes immediately, and the deletes are done asynchronously, just like modified or new objects would during recovery. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html