RGW/S3 after a cluster is/was full

Ulrich Klein <Ulrich.Klein@xxxxxxxxxxxxxx> · Tue, 25 Oct 2022 11:54:17 +0200

Hi,

I have a problem with a full cluster and getting it back to a healthy state.
Fortunately it's a small test cluster with no valuable data in it.
It is used exclusively for RGW/S3, running 17.2.3.

I intentionaly filled it up via rclone/S3 until it got into HEALTH_ERR, so see what would happen in that situation. 
At first it sort-of looks ok, as the cluster apparently goes into a read-only state. I can still get the stored data via S3.

But then there seems to be no way to get out of the full state. Via S3 one can't delete any objects or buckets.
Or did I miss anything? The requests just hang until they time out.

So, I used "rados rm -p <pool> <obj> --force-full" to delete a bunch of those multipart corpses and other "old" objects.
That got the cluster back into HEALTH_OK.

But now the RGW gc seems to be screwed up:
# radosgw-admin gc list --include-all | grep oid | wc -l
158109
# radosgw-admin gc process --include-all
# radosgw-admin gc list --include-all | grep oid | wc -l
158109

I.e.. it has 158109 objects to clean up, but doesn’t clean up anything.
I guess that's because the objects it wants to collect don't exist anymore, but are in some index or other list.
Is there any way to reset or clean up?

I'd appreciate any hints.

Ciao, Uli

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx