Re: Deleting buckets and objects fails to reduce reported cluster usage

Yehuda Sadeh <yehuda@xxxxxxxxxx> · Mon, 1 Dec 2014 13:39:13 -0800

On Sat, Nov 29, 2014 at 2:26 PM, Ben <b@benjackson.email> wrote:
>
> On 29/11/14 11:40, Yehuda Sadeh wrote:
>>
>> On Fri, Nov 28, 2014 at 1:38 PM, Ben <b@benjackson.email> wrote:
>>>
>>> On 29/11/14 01:50, Yehuda Sadeh wrote:
>>>>
>>>> On Thu, Nov 27, 2014 at 9:22 PM, Ben <b@benjackson.email> wrote:
>>>>>
>>>>> On 2014-11-28 15:42, Yehuda Sadeh wrote:
>>>>>>
>>>>>> On Thu, Nov 27, 2014 at 2:15 PM, b <b@benjackson.email> wrote:
>>>>>>>
>>>>>>> On 2014-11-27 11:36, Yehuda Sadeh wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Nov 26, 2014 at 3:49 PM, b <b@benjackson.email> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2014-11-27 10:21, Yehuda Sadeh wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 26, 2014 at 3:09 PM, b <b@benjackson.email> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 2014-11-27 09:38, Yehuda Sadeh wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Nov 26, 2014 at 2:32 PM, b <b@benjackson.email> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've been deleting a bucket which originally had 60TB of data
>>>>>>>>>>>>> in
>>>>>>>>>>>>> it,
>>>>>>>>>>>>> with
>>>>>>>>>>>>> our cluster doing only 1 replication, the total usage was
>>>>>>>>>>>>> 120TB.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've been deleting the objects slowly using S3 browser, and I
>>>>>>>>>>>>> can
>>>>>>>>>>>>> see
>>>>>>>>>>>>> the
>>>>>>>>>>>>> bucket usage is now down to around 2.5TB or 5TB with
>>>>>>>>>>>>> duplication,
>>>>>>>>>>>>> but
>>>>>>>>>>>>> the
>>>>>>>>>>>>> usage in the cluster has not changed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've looked at garbage collection (radosgw-admin gc list
>>>>>>>>>>>>> --include
>>>>>>>>>>>>> all)
>>>>>>>>>>>>> and
>>>>>>>>>>>>> it just reports square brackets "[]"
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've run radosgw-admin temp remove --date=2014-11-20, and it
>>>>>>>>>>>>> doesn't
>>>>>>>>>>>>> appear
>>>>>>>>>>>>> to have any effect.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there a way to check where this space is being consumed?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Running 'ceph df' the USED space in the buckets pool is not
>>>>>>>>>>>>> showing
>>>>>>>>>>>>> any
>>>>>>>>>>>>> of
>>>>>>>>>>>>> the 57TB that should have been freed up from the deletion so
>>>>>>>>>>>>> far.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Running 'radosgw-admin bucket stats | jshon | grep
>>>>>>>>>>>>> size_kb_actual'
>>>>>>>>>>>>> and
>>>>>>>>>>>>> adding up all the buckets usage, this shows that the space has
>>>>>>>>>>>>> been
>>>>>>>>>>>>> freed
>>>>>>>>>>>>> from the bucket, but the cluster is all sorts of messed up.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ANY IDEAS? What can I look at?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Can you run 'radosgw-admin gc list --include-all'?
>>>>>>>>>>>>
>>>>>>>>>>>> Yehuda
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I've done it before, and it just returns square brackets [] (see
>>>>>>>>>>> below)
>>>>>>>>>>>
>>>>>>>>>>> radosgw-admin gc list --include-all
>>>>>>>>>>> []
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Do you know which of the rados pools have all that extra data? Try
>>>>>>>>>> to
>>>>>>>>>> list that pool's objects, verify that there are no surprises there
>>>>>>>>>> (e.g., use 'rados -p <pool> ls').
>>>>>>>>>>
>>>>>>>>>> Yehuda
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm just running that command now, and its taking some time. There
>>>>>>>>> is
>>>>>>>>> a
>>>>>>>>> large number of objects.
>>>>>>>>>
>>>>>>>>> Once it has finished, what should I be looking for?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I assume the pool in question is the one that holds your objects
>>>>>>>> data?
>>>>>>>> You should be looking for objects that are not expected to exist
>>>>>>>> anymore, and objects of buckets that don't exist anymore. The
>>>>>>>> problem
>>>>>>>> here is to identify these.
>>>>>>>> I suggest starting by looking at all the existing buckets, compose a
>>>>>>>> list of all the bucket prefixes for the existing buckets, and try to
>>>>>>>> look whether there are objects that have different prefixes.
>>>>>>>>
>>>>>>>> Yehuda
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Any ideas? I've found the prefix, the number of objects in the pool
>>>>>>> that
>>>>>>> match that prefix numbers in the 21 millions
>>>>>>> The actual 'radosgw-admin bucket stats' command reports it as only
>>>>>>> having
>>>>>>> 1.2 million.
>>>>>>
>>>>>>
>>>>>> Well, the objects you're seeing are raw objects, and since rgw stripes
>>>>>> the data, it is expected to have more raw objects than objects in the
>>>>>> bucket. Still, it seems that you have much too many of these. You can
>>>>>> try to check whether there are pending multipart uploads that were
>>>>>> never completed using the S3 api.
>>>>>> At the moment there's no easy way to figure out which raw objects are
>>>>>> not supposed to exist. The process would be like this:
>>>>>> 1. rados ls -p <data pool>
>>>>>> keep the list sorted
>>>>>> 2. list objects in the bucket
>>>>>> 3. for each object in (2), do: radosgw-admin object stat
>>>>>> --bucket=<bucket> --object=<object> --rgw-cache-enabled=false
>>>>>> (disabling the cache so that it goes quicker)
>>>>>> 4. look at the result of (3), and generate a list of all the parts.
>>>>>> 5. sort result of (4), compare it to (1)
>>>>>>
>>>>>> Note that if you're running firefly or later, the raw objects are not
>>>>>> specified explicitly in the command you run at (3), so you might need
>>>>>> a different procedure, e.g., find out the raw objects random string
>>>>>> that is being used, remove it from the list generated in 1, etc.)
>>>>>>
>>>>>> That's basically it.
>>>>>> I'll be interested to figure out what happened, why the garbage
>>>>>> collection didn't work correctly. You could try verifying that it's
>>>>>> working by:
>>>>>>    - create an object (let's say ~10MB in size).
>>>>>>    - radosgw-admin object stat --bucket=<bucket> --object=<object>
>>>>>>      (keep this info, see
>>>>>>    - remove the object
>>>>>>    - run radosgw-admin gc list --include-all and verify that the raw
>>>>>> parts are listed there
>>>>>>    - wait a few hours, repeat last step, see that the parts don't
>>>>>> appear
>>>>>> there anymore
>>>>>>    - run rados -p <pool> ls, check to see if the raw objects still
>>>>>> exist
>>>>>>
>>>>>> Yehuda
>>>>>>
>>>>>>> Not sure where to go from here, and our cluster is slowly filling up
>>>>>>> while
>>>>>>> not clearing any space.
>>>>>
>>>>>
>>>>>
>>>>> I did the last section:
>>>>>>
>>>>>> I'll be interested to figure out what happened, why the garbage
>>>>>> collection didn't work correctly. You could try verifying that it's
>>>>>> working by:
>>>>>>    - create an object (let's say ~10MB in size).
>>>>>>    - radosgw-admin object stat --bucket=<bucket> --object=<object>
>>>>>>      (keep this info, see
>>>>>>    - remove the object
>>>>>>    - run radosgw-admin gc list --include-all and verify that the raw
>>>>>> parts are listed there
>>>>>>    - wait a few hours, repeat last step, see that the parts don't
>>>>>> appear
>>>>>> there anymore
>>>>>>    - run rados -p <pool> ls, check to see if the raw objects still
>>>>>> exist
>>>>>
>>>>>
>>>>> I added the file, did a stat and it displayed the json output
>>>>> I removed the object and then tried to stat the object, this time it
>>>>> failed
>>>>> to stat the object
>>>>> After this, I ran the gc list include all command and it displayed
>>>>> nothing
>>>>> but the square brackets []
>>>>
>>>> Was the object larger than 512k? Also, did you do it within the 300
>>>> seconds after removing the object?
>>>>
>>>> There should exist a garbage collection pool (by default .rgw.gc, but
>>>> it can be something different if you configured your zone
>>>> differently), can you verify that you have it, and if so, what does it
>>>> contain?
>>>>
>>>> Yehuda
>>>>
>>> Yes, the object was 10M. As soon as I had deleted it from the bucket, I
>>> ran
>>> the command to check garbage collection.
>>> There is a .rgw.gc pool, we haven't changed it from default. It contains
>>> a
>>> number of objects ~7800, but the size of the files is 0kb
>>>
>> They're expected to be 0kb, the data only resides in their omap, and
>> that's not reflected in the objects size. You could run 'rados
>> listomapkeys' on these.
>>
>>>>> Maybe garbage collection isn't working properly..
>>>>>
>>>>> our gc settings are the following, we have 2 object gateways in our
>>>>> cluster
>>>>> too client.radosgw.obj01 and client.radosgw.obj02 (from ceph.conf)
>>>>> [client.radosgw.obj01]
>>>>>     rgw dns name = ceph.###.###
>>>>>     host = obj01
>>>>>     keyring = /etc/ceph/keyring.radosgw.obj01
>>>>>     rgw socket path = /tmp/radosgw.sock
>>>>>     log file = /var/log/ceph/radosgw.log
>>>>>     rgw data = /var/lib/ceph/radosgw/obj01
>>>>>     rgw thread pool size = 128
>>>>>     rgw print continue = True
>>>>>     debug rgw = 0
>>>>>     rgw enable ops log = False
>>>>>     log to stderr = False
>>>>>     rgw enable usage log = False
>>>>>     rgw gc max objs = 7877
>>
>> You should put this line (rgw gx max objs) in the global section of
>> your ceph.conf. Either that, or run your radosgw-admin command with
>> '-n client.radosgw.obj02'. That might change some of the results
>> you're seeing (radosgw-admin gc list --include-all, etc.).
>>
>> Yehuda
>>
>>>>>     rgw gc obj min wait = 300
>>>>>     rgw gc processor period = 600
>>>>>     rgw init timeout = 180
>>>>>     rgw gc processor max time = 600
>>>>> [client.radosgw.obj02]
>>>>>     rgw dns name = ceph.###.###
>>>>>     host = obj02
>>>>>     keyring = /etc/ceph/keyring.radosgw.obj02
>>>>>     rgw socket path = /tmp/radosgw.sock
>>>>>     log file = /var/log/ceph/radosgw.log
>>>>>     rgw data = /var/lib/ceph/radosgw/obj02
>>>>>     rgw thread pool size = 128
>>>>>     rgw print continue = True
>>>>>     debug rgw = 0
>>>>>     rgw enable ops log = False
>>>>>     log to stderr = False
>>>>>     rgw enable usage log = False
>>>>>     rgw gc max objs = 7877
>>>>>     rgw gc obj min wait = 300
>>>>>     rgw gc processor period = 600
>>>>>     rgw init timeout = 180
>>>>>     rgw gc processor max time = 600
>>>>>
>>
>
> I've finally deleted the entire bucket. All 60TB cleared from the bucket,
> the bucket no longer exists.
>
> Yet running rados ls -p .rgw.buckets | grep '4804.14' still lists all the
> _shadow_ files that have that buckets prefix.
>
> Any ideas why these aren't being deleted/cleared up by garbage collection?

Are there any errors in the log? Can you provide a log (debug rgw =
20, debug ms = 1) of the radosgw through the garbage collection stage?

Yehuda
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com