Starting in stable release Octopus 15.2.0 and continuing through Octopus 15.2.6 there is a bug in RGW that could result in data loss. There is both an immediate configuration work-around and a fix is intended for Octopus 15.2.7. [Note: the bug was first merged in a pre-stable release — Octopus 15.1.0] This bug is triggered when a read of a large RGW object (i.e., one with at least one tail segment) takes longer than 1/2 (one half) the time specified in the configuration option rgw_gc_obj_min_wait (2 hours by default, specified in seconds). The bug causes the tail segments of that read object to be added to the RGW garbage collection queue, which will in turn cause them to be deleted after a period of time. The configuration work-around is to set rgw_gc_obj_min_wait to a large enough time (specified in seconds) that will exceed twice the time of the longest read you expect. The downside of this configuration change is that it will delay GC of deleted objects and will tend to cause the GC queue to become longer. Given that there’s a finite amount of space for that queue, if it ever becomes full then tail segments will be deleted in-line with object removal operations, and that could degrade performance slightly. The Octopus backport tracker is: https://tracker.ceph.com/issues/48331 The Octopus backport PR is: https://github.com/ceph/ceph/pull/38249 The master branch tracker, which has the history of the bug, is: https://tracker.ceph.com/issues/47866 Tracking down this bug was a group effort and many people participated. See the master branch tracker for that history. Thanks to everyone who helped out. Eric -- J. Eric Ivancich he / him / his Red Hat Storage Ann Arbor, Michigan, USA _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx