Hi all, I wanted to call attention to some RGW issues that we've observed on a Pacific cluster over the past several weeks. The problems relate to versioned buckets and index entries that can be left behind after transactions complete abnormally. The scenario is multi-faceted and we're still investigating some of the details, but I wanted to provide a big-picture summary of what we've found so far. It looks like most of these issues should be reproducible on versions before and after Pacific as well. I'll enumerate the individual issues below: 1. PUT requests during reshard of versioned bucket fail with 404 and leave behind dark data Tracker: https://tracker.ceph.com/issues/61359 2. When bucket index ops are cancelled it can leave behind zombie index entries This one was merged a few months ago and did make the v16.2.13 release, but in our case we had billions of extra index entries by the time that we had upgraded to the patched version. Tracker: https://tracker.ceph.com/issues/58673 3. Issuing a delete for a key that already has a delete marker as the current version leaves behind index entries and OLH objects Note that the tracker's original description describes the problem a bit differently, but I've clarified the nature of the issue in a comment. Tracker: https://tracker.ceph.com/issues/59663 The extra index entries and OLH objects that are left behind due to these sorts of issues are obviously annoying in regards to the fact that they unnecessarily consume space, but we've found that they can also cause severe performance degradation for bucket listings, lifecycle processing, and other ops indirectly due to higher osd latencies. The reason for the performance impact is that bucket listing calls must repeatedly perform additional OSD ops until they find the requisite number of entries to return. The OSD cls method for bucket listing also does its own internal iteration for the same purpose. Since these entries are invalid, they are skipped. In the case that we observed, where some of our bucket indexes were filled with a sea of contiguous leftover entries, the process of continually iterating over and skipping invalid entries caused enormous read amplification. I believe that the following tracker is describing symptoms that are related to the same issue: https://tracker.ceph.com/issues/59164. Note that this can also cause LC processing to repeatedly fail in cases where there are enough contiguous invalid entries, since the OSD cls code eventually gives up and returns an error that isn't handled. The severity of these issues likely varies greatly based upon client behavior. If anyone has experienced similar problems, we'd love to hear about the nature of how they've manifested for you so that we can be more confident that we've plugged all of the holes. Thanks, Cory Snyder 11:11 Systems _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx