Cleaning Up Failed Multipart Uploads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greetings,

Background: If an object storage client re-uploads parts to a multipart object, RadosGW does not clean up all of the parts properly when the multipart upload is aborted or completed.  You can read all of the gory details (including reproduction steps) in this bug report: http://tracker.ceph.com/issues/16767.

My setup: Hammer 0.94.6 cluster only used for S3-compatible object storage.  RGW stripe size is 4MiB.

My problem: I have buckets that are reporting TB more utilization (and, in one case, 200k more objects) than they should report.  I am trying to remove the detritus from the multipart uploads, but removing the leftover parts directly from the .rgw.buckets pool is having no effect on bucket utilization (i.e. neither the object count nor the space used are declining). 

To give an example, I have a client that uploaded a very large multipart object (8000 15MiB parts).  Due to a bug in the client, it uploaded each of the 8000 parts 6 times.  After the sixth attempt, it gave up and aborted the upload, at which point RGW removed the 8000 parts from the sixth attempt.  When I list the bucket's contents with radosgw-admin (radosgw-admin bucket list --bucket=<bucket> --max-entries=<size of bucket>), I see all of the object's 8000 parts five separate times, each under a namespace of 'multipart'. 

Since the multipart upload was aborted, I can't remove the object by name via the S3 interface.  Since my RGW stripe size is 4MiB, I know that each part of the object will be stored across 4 entries in the .rgw.buckets pool -- 4 MiB in a 'multipart' file, and 4, 4, and 3 MiB in three successive 'shadow' files.  I've created a script to remove these parts (rados -p .rgw.buckets rm <bucket_id>__multipart_<object+prefix>.<part> and rados -p .rgw.buckets rm <bucket_id>__shadow_<object+prefix>.<part>.[1-3]).  The removes are completing successfully (in that additional attempts to remove the object result in a failure), but I'm not seeing any decrease in the bucket's space used, nor am I seeing a decrease in the bucket's object count.  In fact, if I do another 'bucket list', all of the removed parts are still included.

I've looked at the output of 'gc list --include-all', and the removed parts are never showing up for garbage collection.  Garbage collection is otherwise functioning normally and will successfully remove data for any object properly removed via the S3 interface.

I've also gone so far as to write a script to list the contents of bucket shards in the .rgw.buckets.index pool, check for the existence of the entry in .rgw.buckets, and remove entries that cannot be found, but that is also failing to decrement the size/object count counters.

What am I missing here?  Where, aside from .rgw.buckets and .rgw.buckets.index is RGW looking to determine object count and space used for a bucket?

Many thanks to any and all who can assist.

Brian Felton


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux