Ah ok, glad to see it’s on the radar! Just to add onto this with our own findings: 1. Rados GW reports all of the pieced objects, however they are visible with s3cmd or any other client side application. 2. Somewhere along the line it seems like the pieces are being lost in the registry and not marked for automatic removal by ceph, however setting the bucket shards to 0 and running a bucket check command fixes the issues. # radosgw-admin bucket check --check-objects --fix --bucket sgbackup1 But this only works when bucket shards are set to 0. If the bucket check command is run on a bucket with > 0 shards, it fails to remove the data. The ticket points out that remaining orphans have a different id than the pieces which are recombined. The fact that these pieces aren’t visible from a client service accessing ceph metadata sounds like its an issue with the way ceph is tracking the pieces internally. The method for cataloging multipart objects seems to not have taken into account shards or forgotten the system “shadow tags” added onto the upload ID to differentiate them. To recap, we are able to successfully remove the orphaned objects with a bucket check command only on buckets with 0 shards: # radosgw-admin bucket check --check-objects --fix --bucket sgbackup1 Setting bucket shards to a lower amount doesn’t change anything. After running this command than a bucket check, the orphaned data remains. [root@os1-sin1 ~]# radosgw-admin reshard add --bucket vgood-test --num-shards 7 --yes-i-really-mean-it [root@os1-sin1 ~]# radosgw-admin reshard list [ { "time": "2020-09-24T17:14:42.189517Z", "tenant": "", "bucket_name": "vgood-test", "bucket_id": "d8c6ebd1-2bab-414d-9d6b-73bf9bc8fc5a.12045805.1", "new_instance_id": "", "old_num_shards": 11, "new_num_shards": 7 } ] However setting bucket shards to 0 then running the bucket check command removed the orphan data. [root@os1-sin1 ~]# radosgw-admin reshard add --bucket vgood-test --num-shards 0 --yes-i-really-mean-it [root@os1-sin1 ~]# radosgw-admin reshard list [ { "time": "2020-09-24T17:23:34.843021Z", "tenant": "", "bucket_name": "vgood-test", "bucket_id": "d8c6ebd1-2bab-414d-9d6b-73bf9bc8fc5a.14335315.1", "new_instance_id": "", "old_num_shards": 7, "new_num_shards": 0 } ] [root@os1-sin1 ~]# radosgw-admin reshard process 2020-09-24T13:23:50.895-0400 7f24a0e47200 1 execute INFO: reshard of bucket "vgood-test" from "vgood-test:d8c6ebd1-2bab-414d-9d6b-73bf9bc8fc5a.14335315.1" to "vgood-test:d8c6ebd1-2bab-414d-9d6b-73bf9bc8fc5a.14335720.1" completed successfully Is there any word on where this behavior might be originating from? I updated the ticket with this additional info, and would be glad to contribute any resources we can offer to help introduce a patch. We’re facing the same issues as in the ticket, running these clean up commands might be feasible for smaller buckets but is very unwieldy for the size clusters we’re running and are also losing a few terabytes of capacity. - Gavin
|
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx