Hello, Recently we got a problem from an internal customer on our S3. Our setup consist of roughly 10 servers with 140 OSDs. Our 3 RGWs are collocated with monitors on dedicated servers in a HA setup with HAProxy in front. We are running 16.2.14 on Podman with Cephadm. Our S3 is constantly having a traffic of 500 req/s average per RGW instance. The problem is described in this issue: https://tracker.ceph.com/issues/63935. Basically this customer is having a Grafana Mimir instance pushing to our S3 and during a compaction process it does a special pattern like this: ``` 29/Dec/2023:17:13:28.961 rgw-frontend~ rgw-backend/server-mon-01-rgw0 0/0/0/127/127 200 228 - - ---- 132/132/70/67/0 0/0 "PUT /1234/object HTTP/1.1" 29/Dec/2023:17:13:29.101 rgw-frontend~ rgw-backend/server-mon-01-rgw0 0/0/0/1/1 200 381 - - ---- 132/132/76/71/0 0/0 "GET /1234/object HTTP/1.1" 29/Dec/2023:17:13:29.121 rgw-frontend~ rgw-backend/server-mon-01-rgw0 0/0/0/1/1 200 381 - - ---- 132/132/71/59/0 0/0 "GET /1234/object HTTP/1.1" 29/Dec/2023:17:13:29.137 rgw-frontend~ rgw-backend/server-mon-03-rgw0 0/0/0/4/4 204 153 - - ---- 132/132/71/6/0 0/0 "DELETE /1234/object HTTP/1.1" 29/Dec/2023:19:03:21.671 rgw-frontend~ rgw-backend/server-mon-03-rgw0 0/0/0/1/1 404 472 - - ---- 55/55/26/0/0 0/0 "GET /1234/object HTTP/1.1" ``` It is doing PUT, GET and DELETE in the same second. Afterwards the customer can see the deleted object when doing a ListObjects in the bucket but if he tries to access it then RGW returns a 404. After looking in Ceph, it appears the object has a bucket index entry but the associated RADOS object does not exist anymore. The bucket does not have versioning or object locking. Did someone encounter something similar? Thank you! Regards, -- Mathias Chapelain Storage Engineer Proton AG _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx