Re: RGW/S3 losing multipart upload objects

Ulrich Klein <Ulrich.Klein@xxxxxxxxxxxxxxx> · Thu, 17 Mar 2022 15:43:37 +0100

Ok, I’ll try again on 16.2.7. Only downside is that then I can’t use the dashboard on Safari i.e. iPads for monitoring anymore.

And to make sure: From a user’s/client’s perspective the objects do disappear. Only on the Ceph/RGW-side - including accounting - they are still there and can’t be removed.

Ciao, Uli 

> On 17. 03 2022, at 15:30, Soumya Koduri <skoduri@xxxxxxxxxx> wrote:
> 
> On 3/17/22 17:16, Ulrich Klein wrote:
>> Hi,
>> 
>> My second attempt to get help with a problem I'm trying to solve for about 6 month now.
>> 
>> I have a Ceph 16.2.6 test cluster, used almost exclusively for providing RGW/S3 service. similar to a production cluster.
>> 
>> The problem I have is this:
>> A client uploads (via S3) a bunch of large files into a bucket via multiparts
>> The upload(s) get interrupted and retried
>> In the end from a client's perspective all the files are visible and everything looks fine.
>> But on the cluster there are many more objects in the buckets
>> Even after cleaning out the incomplete multipart uploads there are too many objects
>> Even after deleting all the visible objects from the bucket there are still objects in the bucket
>> I have so far found no way to get rid of those left-over objects.
>> It's screwing up space accounting and I'm afraid I'll eventually have a cluster full of those lost objects.
>> The only way to clean up seems to be to copy te contents of a bucket to a new bucket and delete the screwed-up bucket. But on a production system that's not always a real option.
>> 
>> I've found a variety of older threads that describe a similar problem. None of them decribing a solution :(
>> 
>> 
>> 
>> I can pretty easily reproduce the problem with this sequence:
>> 
>> On a client system create a directory with ~30 200MB files. (On a faster system I'd probably need bigger or more files)
>> tstfiles/tst01 - tst29
>> 
>> run
>> $ rclone mkdir tester:/test-bucket # creates a bucket on the test system with user tester
>> Run
>> $ rclone sync -v tstfiles tester:/test-bucket/tstfiles
>> a couple of times (6-8), interrupting each one via CNTRL-C
>> Eventually let one finish.
>> 
>> Now I can use s3cmd to see all the files:
>> $ s3cmd ls -lr s3://test-bucket/tstfiles
>> 2022-03-16 17:11   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD     s3://test-bucket/tstfiles/tst01
>> ...
>> 2022-03-16 17:13   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD     s3://test-bucket/tstfiles/tst29
>> 
>> ... and to list incomplete uploads:
>> $ s3cmd multipart s3://test-bucket
>> s3://test-bucket/
>> Initiated	Path	Id
>> 2022-03-16T17:11:19.074Z	s3://test-bucket/tstfiles/tst05	2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
>> ...
>> 2022-03-16T17:12:41.583Z	s3://test-bucket/tstfiles/tst28	2~exVQUILhVSmFqWxCuAflRa4Tfq4nUQa
>> 
>> I can abort the uploads with
>> $  s3cmd abortmp s3://test-bucket/tstfiles/tst05 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
>> ...
> 
> 
> 
> On the latest master, I see that these objects are deleted immediately post abortmp. I believe this issue may have beenn fixed as part of [1], backported to v16.2.7 [2]. Maybe you could try upgrading your cluster and recheck.
> 
> 
> Thanks,
> 
> Soumya
> 
> 
> [1] https://tracker.ceph.com/issues/53222
> 
> [2] https://tracker.ceph.com/issues/53291
> 
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx