Thanks, Soumya. It's also possible that what's reproducing is the known (space) leak during re-upload of multipart parts, described here: https://tracker.ceph.com/issues/44660. A fix for this is being worked on, it's taking a while. Matt On Thu, Mar 17, 2022 at 10:31 AM Soumya Koduri <skoduri@xxxxxxxxxx> wrote: > > On 3/17/22 17:16, Ulrich Klein wrote: > > Hi, > > > > My second attempt to get help with a problem I'm trying to solve for about 6 month now. > > > > I have a Ceph 16.2.6 test cluster, used almost exclusively for providing RGW/S3 service. similar to a production cluster. > > > > The problem I have is this: > > A client uploads (via S3) a bunch of large files into a bucket via multiparts > > The upload(s) get interrupted and retried > > In the end from a client's perspective all the files are visible and everything looks fine. > > But on the cluster there are many more objects in the buckets > > Even after cleaning out the incomplete multipart uploads there are too many objects > > Even after deleting all the visible objects from the bucket there are still objects in the bucket > > I have so far found no way to get rid of those left-over objects. > > It's screwing up space accounting and I'm afraid I'll eventually have a cluster full of those lost objects. > > The only way to clean up seems to be to copy te contents of a bucket to a new bucket and delete the screwed-up bucket. But on a production system that's not always a real option. > > > > I've found a variety of older threads that describe a similar problem. None of them decribing a solution :( > > > > > > > > I can pretty easily reproduce the problem with this sequence: > > > > On a client system create a directory with ~30 200MB files. (On a faster system I'd probably need bigger or more files) > > tstfiles/tst01 - tst29 > > > > run > > $ rclone mkdir tester:/test-bucket # creates a bucket on the test system with user tester > > Run > > $ rclone sync -v tstfiles tester:/test-bucket/tstfiles > > a couple of times (6-8), interrupting each one via CNTRL-C > > Eventually let one finish. > > > > Now I can use s3cmd to see all the files: > > $ s3cmd ls -lr s3://test-bucket/tstfiles > > 2022-03-16 17:11 200M ecb28853bd18eeae185b0b12bd47333c-40 STANDARD s3://test-bucket/tstfiles/tst01 > > ... > > 2022-03-16 17:13 200M ecb28853bd18eeae185b0b12bd47333c-40 STANDARD s3://test-bucket/tstfiles/tst29 > > > > ... and to list incomplete uploads: > > $ s3cmd multipart s3://test-bucket > > s3://test-bucket/ > > Initiated Path Id > > 2022-03-16T17:11:19.074Z s3://test-bucket/tstfiles/tst05 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g > > ... > > 2022-03-16T17:12:41.583Z s3://test-bucket/tstfiles/tst28 2~exVQUILhVSmFqWxCuAflRa4Tfq4nUQa > > > > I can abort the uploads with > > $ s3cmd abortmp s3://test-bucket/tstfiles/tst05 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g > > ... > > > > On the latest master, I see that these objects are deleted immediately > post abortmp. I believe this issue may have beenn fixed as part of [1], > backported to v16.2.7 [2]. Maybe you could try upgrading your cluster > and recheck. > > > Thanks, > > Soumya > > > [1] https://tracker.ceph.com/issues/53222 > > [2] https://tracker.ceph.com/issues/53291 > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx