RGW versioned bucket index issues

Cory Snyder <csnyder@xxxxxxxxxxxxxxx> · Wed, 31 May 2023 10:16:20 +0000

Hi all,

I wanted to call attention to some RGW issues that we've observed on a
Pacific cluster over the past several weeks. The problems relate to versioned
buckets and index entries that can be left behind after transactions complete
abnormally. The scenario is multi-faceted and we're still investigating some of
the details, but I wanted to provide a big-picture summary of what we've found
so far. It looks like most of these issues should be reproducible on versions
before and after Pacific as well. I'll enumerate the individual issues below:

1. PUT requests during reshard of versioned bucket fail with 404 and leave
   behind dark data

   Tracker: https://tracker.ceph.com/issues/61359

2. When bucket index ops are cancelled it can leave behind zombie index entries

   This one was merged a few months ago and did make the v16.2.13 release, but
   in our case we had billions of extra index entries by the time that we had
   upgraded to the patched version.

   Tracker: https://tracker.ceph.com/issues/58673

3. Issuing a delete for a key that already has a delete marker as the current
   version leaves behind index entries and OLH objects

   Note that the tracker's original description describes the problem a bit
   differently, but I've clarified the nature of the issue in a comment.

   Tracker: https://tracker.ceph.com/issues/59663

The extra index entries and OLH objects that are left behind due to these sorts
of issues are obviously annoying in regards to the fact that they unnecessarily
consume space, but we've found that they can also cause severe performance
degradation for bucket listings, lifecycle processing, and other ops indirectly
due to higher osd latencies.

The reason for the performance impact is that bucket listing calls must
repeatedly perform additional OSD ops until they find the requisite number
of entries to return. The OSD cls method for bucket listing also does its own
internal iteration for the same purpose. Since these entries are invalid, they
are skipped. In the case that we observed, where some of our bucket indexes were
filled with a sea of contiguous leftover entries, the process of continually
iterating over and skipping invalid entries caused enormous read amplification.
I believe that the following tracker is describing symptoms that are related to
the same issue: https://tracker.ceph.com/issues/59164.

Note that this can also cause LC processing to repeatedly fail in cases where
there are enough contiguous invalid entries, since the OSD cls code eventually
gives up and returns an error that isn't handled.

The severity of these issues likely varies greatly based upon client behavior. 
If anyone has experienced similar problems, we'd love to hear about the nature
of how they've manifested for you so that we can be more confident that we've
plugged all of the holes.

Thanks,

Cory Snyder
11:11 Systems
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx