Re: RGW/S3 losing multipart upload objects

Matt Benjamin <mbenjami@xxxxxxxxxx> · Thu, 17 Mar 2022 10:34:30 -0400

Thanks, Soumya.

It's also possible that what's reproducing is the known (space) leak
during re-upload of multipart parts, described here:
https://tracker.ceph.com/issues/44660.
A fix for this is being worked on, it's taking a while.

Matt

On Thu, Mar 17, 2022 at 10:31 AM Soumya Koduri <skoduri@xxxxxxxxxx> wrote:
>
> On 3/17/22 17:16, Ulrich Klein wrote:
> > Hi,
> >
> > My second attempt to get help with a problem I'm trying to solve for about 6 month now.
> >
> > I have a Ceph 16.2.6 test cluster, used almost exclusively for providing RGW/S3 service. similar to a production cluster.
> >
> > The problem I have is this:
> > A client uploads (via S3) a bunch of large files into a bucket via multiparts
> > The upload(s) get interrupted and retried
> > In the end from a client's perspective all the files are visible and everything looks fine.
> > But on the cluster there are many more objects in the buckets
> > Even after cleaning out the incomplete multipart uploads there are too many objects
> > Even after deleting all the visible objects from the bucket there are still objects in the bucket
> > I have so far found no way to get rid of those left-over objects.
> > It's screwing up space accounting and I'm afraid I'll eventually have a cluster full of those lost objects.
> > The only way to clean up seems to be to copy te contents of a bucket to a new bucket and delete the screwed-up bucket. But on a production system that's not always a real option.
> >
> > I've found a variety of older threads that describe a similar problem. None of them decribing a solution :(
> >
> >
> >
> > I can pretty easily reproduce the problem with this sequence:
> >
> > On a client system create a directory with ~30 200MB files. (On a faster system I'd probably need bigger or more files)
> > tstfiles/tst01 - tst29
> >
> > run
> > $ rclone mkdir tester:/test-bucket # creates a bucket on the test system with user tester
> > Run
> > $ rclone sync -v tstfiles tester:/test-bucket/tstfiles
> > a couple of times (6-8), interrupting each one via CNTRL-C
> > Eventually let one finish.
> >
> > Now I can use s3cmd to see all the files:
> > $ s3cmd ls -lr s3://test-bucket/tstfiles
> > 2022-03-16 17:11   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD     s3://test-bucket/tstfiles/tst01
> > ...
> > 2022-03-16 17:13   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD     s3://test-bucket/tstfiles/tst29
> >
> > ... and to list incomplete uploads:
> > $ s3cmd multipart s3://test-bucket
> > s3://test-bucket/
> > Initiated     Path    Id
> > 2022-03-16T17:11:19.074Z      s3://test-bucket/tstfiles/tst05 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
> > ...
> > 2022-03-16T17:12:41.583Z      s3://test-bucket/tstfiles/tst28 2~exVQUILhVSmFqWxCuAflRa4Tfq4nUQa
> >
> > I can abort the uploads with
> > $  s3cmd abortmp s3://test-bucket/tstfiles/tst05 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
> > ...
>
>
>
> On the latest master, I see that these objects are deleted immediately
> post abortmp. I believe this issue may have beenn fixed as part of [1],
> backported to v16.2.7 [2]. Maybe you could try upgrading your cluster
> and recheck.
>
>
> Thanks,
>
> Soumya
>
>
> [1] https://tracker.ceph.com/issues/53222
>
> [2] https://tracker.ceph.com/issues/53291
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx