Re: RGW/S3 losing multipart upload objects

Ulrich Klein <Ulrich.Klein@xxxxxxxxxxxxxxx> · Fri, 29 Apr 2022 12:05:14 +0200

Hi,

I just tried again on a Quincy 17.2.0.
Same procedure, same problem. 
I just wonder if nobody else sees that problem?

Ciao, Uli

> On 18. 03 2022, at 12:18, Ulrich Klein <ulrich.klein@xxxxxxxxxxxxxxx> wrote:
> 
> I tried it on a mini-cluster (4 Raspberries) with 16.2.7. 
> Same procedure, same effect. I just can’t get rid of these objects.
> 
> Is there any method that would allow me to delete these objects without damaging RGW?
> 
> Ciao, Uli 
> 
>> On 17. 03 2022, at 15:30, Soumya Koduri <skoduri@xxxxxxxxxx> wrote:
>> 
>> On 3/17/22 17:16, Ulrich Klein wrote:
>>> Hi,
>>> 
>>> My second attempt to get help with a problem I'm trying to solve for about 6 month now.
>>> 
>>> I have a Ceph 16.2.6 test cluster, used almost exclusively for providing RGW/S3 service. similar to a production cluster.
>>> 
>>> The problem I have is this:
>>> A client uploads (via S3) a bunch of large files into a bucket via multiparts
>>> The upload(s) get interrupted and retried
>>> In the end from a client's perspective all the files are visible and everything looks fine.
>>> But on the cluster there are many more objects in the buckets
>>> Even after cleaning out the incomplete multipart uploads there are too many objects
>>> Even after deleting all the visible objects from the bucket there are still objects in the bucket
>>> I have so far found no way to get rid of those left-over objects.
>>> It's screwing up space accounting and I'm afraid I'll eventually have a cluster full of those lost objects.
>>> The only way to clean up seems to be to copy te contents of a bucket to a new bucket and delete the screwed-up bucket. But on a production system that's not always a real option.
>>> 
>>> I've found a variety of older threads that describe a similar problem. None of them decribing a solution :(
>>> 
>>> 
>>> 
>>> I can pretty easily reproduce the problem with this sequence:
>>> 
>>> On a client system create a directory with ~30 200MB files. (On a faster system I'd probably need bigger or more files)
>>> tstfiles/tst01 - tst29
>>> 
>>> run
>>> $ rclone mkdir tester:/test-bucket # creates a bucket on the test system with user tester
>>> Run
>>> $ rclone sync -v tstfiles tester:/test-bucket/tstfiles
>>> a couple of times (6-8), interrupting each one via CNTRL-C
>>> Eventually let one finish.
>>> 
>>> Now I can use s3cmd to see all the files:
>>> $ s3cmd ls -lr s3://test-bucket/tstfiles
>>> 2022-03-16 17:11   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD     s3://test-bucket/tstfiles/tst01
>>> ...
>>> 2022-03-16 17:13   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD     s3://test-bucket/tstfiles/tst29
>>> 
>>> ... and to list incomplete uploads:
>>> $ s3cmd multipart s3://test-bucket
>>> s3://test-bucket/
>>> Initiated	Path	Id
>>> 2022-03-16T17:11:19.074Z	s3://test-bucket/tstfiles/tst05	2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
>>> ...
>>> 2022-03-16T17:12:41.583Z	s3://test-bucket/tstfiles/tst28	2~exVQUILhVSmFqWxCuAflRa4Tfq4nUQa
>>> 
>>> I can abort the uploads with
>>> $  s3cmd abortmp s3://test-bucket/tstfiles/tst05 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
>>> ...
>> 
>> 
>> 
>> On the latest master, I see that these objects are deleted immediately post abortmp. I believe this issue may have beenn fixed as part of [1], backported to v16.2.7 [2]. Maybe you could try upgrading your cluster and recheck.
>> 
>> 
>> Thanks,
>> 
>> Soumya
>> 
>> 
>> [1] https://tracker.ceph.com/issues/53222
>> 
>> [2] https://tracker.ceph.com/issues/53291
>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx