On 3/17/22 17:16, Ulrich Klein wrote:
Hi, My second attempt to get help with a problem I'm trying to solve for about 6 month now. I have a Ceph 16.2.6 test cluster, used almost exclusively for providing RGW/S3 service. similar to a production cluster. The problem I have is this: A client uploads (via S3) a bunch of large files into a bucket via multiparts The upload(s) get interrupted and retried In the end from a client's perspective all the files are visible and everything looks fine. But on the cluster there are many more objects in the buckets Even after cleaning out the incomplete multipart uploads there are too many objects Even after deleting all the visible objects from the bucket there are still objects in the bucket I have so far found no way to get rid of those left-over objects. It's screwing up space accounting and I'm afraid I'll eventually have a cluster full of those lost objects. The only way to clean up seems to be to copy te contents of a bucket to a new bucket and delete the screwed-up bucket. But on a production system that's not always a real option. I've found a variety of older threads that describe a similar problem. None of them decribing a solution :( I can pretty easily reproduce the problem with this sequence: On a client system create a directory with ~30 200MB files. (On a faster system I'd probably need bigger or more files) tstfiles/tst01 - tst29 run $ rclone mkdir tester:/test-bucket # creates a bucket on the test system with user tester Run $ rclone sync -v tstfiles tester:/test-bucket/tstfiles a couple of times (6-8), interrupting each one via CNTRL-C Eventually let one finish. Now I can use s3cmd to see all the files: $ s3cmd ls -lr s3://test-bucket/tstfiles 2022-03-16 17:11 200M ecb28853bd18eeae185b0b12bd47333c-40 STANDARD s3://test-bucket/tstfiles/tst01 ... 2022-03-16 17:13 200M ecb28853bd18eeae185b0b12bd47333c-40 STANDARD s3://test-bucket/tstfiles/tst29 ... and to list incomplete uploads: $ s3cmd multipart s3://test-bucket s3://test-bucket/ Initiated Path Id 2022-03-16T17:11:19.074Z s3://test-bucket/tstfiles/tst05 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g ... 2022-03-16T17:12:41.583Z s3://test-bucket/tstfiles/tst28 2~exVQUILhVSmFqWxCuAflRa4Tfq4nUQa I can abort the uploads with $ s3cmd abortmp s3://test-bucket/tstfiles/tst05 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g ...
On the latest master, I see that these objects are deleted immediately post abortmp. I believe this issue may have beenn fixed as part of [1], backported to v16.2.7 [2]. Maybe you could try upgrading your cluster and recheck.
Thanks, Soumya [1] https://tracker.ceph.com/issues/53222 [2] https://tracker.ceph.com/issues/53291 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx