Re: RGW/S3 losing multipart upload objects

Ulrich Klein <Ulrich.Klein@xxxxxxxxxxxxxxx> · Thu, 17 Mar 2022 15:47:18 +0100

Yo, that one is one of the threads that looks very similar to my problem, just with no resolution for me. 
I have a multi-site setup, so no resharding. Tried it — setup RGW from scratch due to all the “funny” errors.

So, hoping for 16.2.7 or the “in the works” fix :)

Ciao, Uli 

> On 17. 03 2022, at 15:34, Matt Benjamin <mbenjami@xxxxxxxxxx> wrote:
> 
> Thanks, Soumya.
> 
> It's also possible that what's reproducing is the known (space) leak
> during re-upload of multipart parts, described here:
> https://tracker.ceph.com/issues/44660.
> A fix for this is being worked on, it's taking a while.
> 
> Matt
> 
> On Thu, Mar 17, 2022 at 10:31 AM Soumya Koduri <skoduri@xxxxxxxxxx> wrote:
>> 
>> On 3/17/22 17:16, Ulrich Klein wrote:
>>> Hi,
>>> 
>>> My second attempt to get help with a problem I'm trying to solve for about 6 month now.
>>> 
>>> I have a Ceph 16.2.6 test cluster, used almost exclusively for providing RGW/S3 service. similar to a production cluster.
>>> 
>>> The problem I have is this:
>>> A client uploads (via S3) a bunch of large files into a bucket via multiparts
>>> The upload(s) get interrupted and retried
>>> In the end from a client's perspective all the files are visible and everything looks fine.
>>> But on the cluster there are many more objects in the buckets
>>> Even after cleaning out the incomplete multipart uploads there are too many objects
>>> Even after deleting all the visible objects from the bucket there are still objects in the bucket
>>> I have so far found no way to get rid of those left-over objects.
>>> It's screwing up space accounting and I'm afraid I'll eventually have a cluster full of those lost objects.
>>> The only way to clean up seems to be to copy te contents of a bucket to a new bucket and delete the screwed-up bucket. But on a production system that's not always a real option.
>>> 
>>> I've found a variety of older threads that describe a similar problem. None of them decribing a solution :(
>>> 
>>> 
>>> 
>>> I can pretty easily reproduce the problem with this sequence:
>>> 
>>> On a client system create a directory with ~30 200MB files. (On a faster system I'd probably need bigger or more files)
>>> tstfiles/tst01 - tst29
>>> 
>>> run
>>> $ rclone mkdir tester:/test-bucket # creates a bucket on the test system with user tester
>>> Run
>>> $ rclone sync -v tstfiles tester:/test-bucket/tstfiles
>>> a couple of times (6-8), interrupting each one via CNTRL-C
>>> Eventually let one finish.
>>> 
>>> Now I can use s3cmd to see all the files:
>>> $ s3cmd ls -lr s3://test-bucket/tstfiles
>>> 2022-03-16 17:11   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD     s3://test-bucket/tstfiles/tst01
>>> ...
>>> 2022-03-16 17:13   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD     s3://test-bucket/tstfiles/tst29
>>> 
>>> ... and to list incomplete uploads:
>>> $ s3cmd multipart s3://test-bucket
>>> s3://test-bucket/
>>> Initiated     Path    Id
>>> 2022-03-16T17:11:19.074Z      s3://test-bucket/tstfiles/tst05 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
>>> ...
>>> 2022-03-16T17:12:41.583Z      s3://test-bucket/tstfiles/tst28 2~exVQUILhVSmFqWxCuAflRa4Tfq4nUQa
>>> 
>>> I can abort the uploads with
>>> $  s3cmd abortmp s3://test-bucket/tstfiles/tst05 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
>>> ...
>> 
>> 
>> 
>> On the latest master, I see that these objects are deleted immediately
>> post abortmp. I believe this issue may have beenn fixed as part of [1],
>> backported to v16.2.7 [2]. Maybe you could try upgrading your cluster
>> and recheck.
>> 
>> 
>> Thanks,
>> 
>> Soumya
>> 
>> 
>> [1] https://tracker.ceph.com/issues/53222
>> 
>> [2] https://tracker.ceph.com/issues/53291
>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
> 
> 
> -- 
> 
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
> 
> http://www.redhat.com/en/technologies/storage
> 
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx