Re: Deleting buckets and objects fails to reduce reported cluster usage

Yehuda Sadeh <yehuda@xxxxxxxxxx> · Fri, 28 Nov 2014 16:40:22 -0800

On Fri, Nov 28, 2014 at 1:38 PM, Ben <b@benjackson.email> wrote:
>
> On 29/11/14 01:50, Yehuda Sadeh wrote:
>>
>> On Thu, Nov 27, 2014 at 9:22 PM, Ben <b@benjackson.email> wrote:
>>>
>>> On 2014-11-28 15:42, Yehuda Sadeh wrote:
>>>>
>>>> On Thu, Nov 27, 2014 at 2:15 PM, b <b@benjackson.email> wrote:
>>>>>
>>>>> On 2014-11-27 11:36, Yehuda Sadeh wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 26, 2014 at 3:49 PM, b <b@benjackson.email> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2014-11-27 10:21, Yehuda Sadeh wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Nov 26, 2014 at 3:09 PM, b <b@benjackson.email> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2014-11-27 09:38, Yehuda Sadeh wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 26, 2014 at 2:32 PM, b <b@benjackson.email> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I've been deleting a bucket which originally had 60TB of data in
>>>>>>>>>>> it,
>>>>>>>>>>> with
>>>>>>>>>>> our cluster doing only 1 replication, the total usage was 120TB.
>>>>>>>>>>>
>>>>>>>>>>> I've been deleting the objects slowly using S3 browser, and I can
>>>>>>>>>>> see
>>>>>>>>>>> the
>>>>>>>>>>> bucket usage is now down to around 2.5TB or 5TB with duplication,
>>>>>>>>>>> but
>>>>>>>>>>> the
>>>>>>>>>>> usage in the cluster has not changed.
>>>>>>>>>>>
>>>>>>>>>>> I've looked at garbage collection (radosgw-admin gc list
>>>>>>>>>>> --include
>>>>>>>>>>> all)
>>>>>>>>>>> and
>>>>>>>>>>> it just reports square brackets "[]"
>>>>>>>>>>>
>>>>>>>>>>> I've run radosgw-admin temp remove --date=2014-11-20, and it
>>>>>>>>>>> doesn't
>>>>>>>>>>> appear
>>>>>>>>>>> to have any effect.
>>>>>>>>>>>
>>>>>>>>>>> Is there a way to check where this space is being consumed?
>>>>>>>>>>>
>>>>>>>>>>> Running 'ceph df' the USED space in the buckets pool is not
>>>>>>>>>>> showing
>>>>>>>>>>> any
>>>>>>>>>>> of
>>>>>>>>>>> the 57TB that should have been freed up from the deletion so far.
>>>>>>>>>>>
>>>>>>>>>>> Running 'radosgw-admin bucket stats | jshon | grep
>>>>>>>>>>> size_kb_actual'
>>>>>>>>>>> and
>>>>>>>>>>> adding up all the buckets usage, this shows that the space has
>>>>>>>>>>> been
>>>>>>>>>>> freed
>>>>>>>>>>> from the bucket, but the cluster is all sorts of messed up.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ANY IDEAS? What can I look at?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Can you run 'radosgw-admin gc list --include-all'?
>>>>>>>>>>
>>>>>>>>>> Yehuda
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I've done it before, and it just returns square brackets [] (see
>>>>>>>>> below)
>>>>>>>>>
>>>>>>>>> radosgw-admin gc list --include-all
>>>>>>>>> []
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Do you know which of the rados pools have all that extra data? Try
>>>>>>>> to
>>>>>>>> list that pool's objects, verify that there are no surprises there
>>>>>>>> (e.g., use 'rados -p <pool> ls').
>>>>>>>>
>>>>>>>> Yehuda
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I'm just running that command now, and its taking some time. There is
>>>>>>> a
>>>>>>> large number of objects.
>>>>>>>
>>>>>>> Once it has finished, what should I be looking for?
>>>>>>
>>>>>>
>>>>>>
>>>>>> I assume the pool in question is the one that holds your objects data?
>>>>>> You should be looking for objects that are not expected to exist
>>>>>> anymore, and objects of buckets that don't exist anymore. The problem
>>>>>> here is to identify these.
>>>>>> I suggest starting by looking at all the existing buckets, compose a
>>>>>> list of all the bucket prefixes for the existing buckets, and try to
>>>>>> look whether there are objects that have different prefixes.
>>>>>>
>>>>>> Yehuda
>>>>>
>>>>>
>>>>>
>>>>> Any ideas? I've found the prefix, the number of objects in the pool
>>>>> that
>>>>> match that prefix numbers in the 21 millions
>>>>> The actual 'radosgw-admin bucket stats' command reports it as only
>>>>> having
>>>>> 1.2 million.
>>>>
>>>>
>>>> Well, the objects you're seeing are raw objects, and since rgw stripes
>>>> the data, it is expected to have more raw objects than objects in the
>>>> bucket. Still, it seems that you have much too many of these. You can
>>>> try to check whether there are pending multipart uploads that were
>>>> never completed using the S3 api.
>>>> At the moment there's no easy way to figure out which raw objects are
>>>> not supposed to exist. The process would be like this:
>>>> 1. rados ls -p <data pool>
>>>> keep the list sorted
>>>> 2. list objects in the bucket
>>>> 3. for each object in (2), do: radosgw-admin object stat
>>>> --bucket=<bucket> --object=<object> --rgw-cache-enabled=false
>>>> (disabling the cache so that it goes quicker)
>>>> 4. look at the result of (3), and generate a list of all the parts.
>>>> 5. sort result of (4), compare it to (1)
>>>>
>>>> Note that if you're running firefly or later, the raw objects are not
>>>> specified explicitly in the command you run at (3), so you might need
>>>> a different procedure, e.g., find out the raw objects random string
>>>> that is being used, remove it from the list generated in 1, etc.)
>>>>
>>>> That's basically it.
>>>> I'll be interested to figure out what happened, why the garbage
>>>> collection didn't work correctly. You could try verifying that it's
>>>> working by:
>>>>   - create an object (let's say ~10MB in size).
>>>>   - radosgw-admin object stat --bucket=<bucket> --object=<object>
>>>>     (keep this info, see
>>>>   - remove the object
>>>>   - run radosgw-admin gc list --include-all and verify that the raw
>>>> parts are listed there
>>>>   - wait a few hours, repeat last step, see that the parts don't appear
>>>> there anymore
>>>>   - run rados -p <pool> ls, check to see if the raw objects still exist
>>>>
>>>> Yehuda
>>>>
>>>>> Not sure where to go from here, and our cluster is slowly filling up
>>>>> while
>>>>> not clearing any space.
>>>
>>>
>>>
>>> I did the last section:
>>>>
>>>> I'll be interested to figure out what happened, why the garbage
>>>> collection didn't work correctly. You could try verifying that it's
>>>> working by:
>>>>   - create an object (let's say ~10MB in size).
>>>>   - radosgw-admin object stat --bucket=<bucket> --object=<object>
>>>>     (keep this info, see
>>>>   - remove the object
>>>>   - run radosgw-admin gc list --include-all and verify that the raw
>>>> parts are listed there
>>>>   - wait a few hours, repeat last step, see that the parts don't appear
>>>> there anymore
>>>>   - run rados -p <pool> ls, check to see if the raw objects still exist
>>>
>>>
>>> I added the file, did a stat and it displayed the json output
>>> I removed the object and then tried to stat the object, this time it
>>> failed
>>> to stat the object
>>> After this, I ran the gc list include all command and it displayed
>>> nothing
>>> but the square brackets []
>>
>> Was the object larger than 512k? Also, did you do it within the 300
>> seconds after removing the object?
>>
>> There should exist a garbage collection pool (by default .rgw.gc, but
>> it can be something different if you configured your zone
>> differently), can you verify that you have it, and if so, what does it
>> contain?
>>
>> Yehuda
>>
> Yes, the object was 10M. As soon as I had deleted it from the bucket, I ran
> the command to check garbage collection.
> There is a .rgw.gc pool, we haven't changed it from default. It contains a
> number of objects ~7800, but the size of the files is 0kb
>

They're expected to be 0kb, the data only resides in their omap, and
that's not reflected in the objects size. You could run 'rados
listomapkeys' on these.

>
>>> Maybe garbage collection isn't working properly..
>>>
>>> our gc settings are the following, we have 2 object gateways in our
>>> cluster
>>> too client.radosgw.obj01 and client.radosgw.obj02 (from ceph.conf)
>>> [client.radosgw.obj01]
>>>    rgw dns name = ceph.###.###
>>>    host = obj01
>>>    keyring = /etc/ceph/keyring.radosgw.obj01
>>>    rgw socket path = /tmp/radosgw.sock
>>>    log file = /var/log/ceph/radosgw.log
>>>    rgw data = /var/lib/ceph/radosgw/obj01
>>>    rgw thread pool size = 128
>>>    rgw print continue = True
>>>    debug rgw = 0
>>>    rgw enable ops log = False
>>>    log to stderr = False
>>>    rgw enable usage log = False
>>>    rgw gc max objs = 7877

You should put this line (rgw gx max objs) in the global section of
your ceph.conf. Either that, or run your radosgw-admin command with
'-n client.radosgw.obj02'. That might change some of the results
you're seeing (radosgw-admin gc list --include-all, etc.).

Yehuda

>>>    rgw gc obj min wait = 300
>>>    rgw gc processor period = 600
>>>    rgw init timeout = 180
>>>    rgw gc processor max time = 600
>>> [client.radosgw.obj02]
>>>    rgw dns name = ceph.###.###
>>>    host = obj02
>>>    keyring = /etc/ceph/keyring.radosgw.obj02
>>>    rgw socket path = /tmp/radosgw.sock
>>>    log file = /var/log/ceph/radosgw.log
>>>    rgw data = /var/lib/ceph/radosgw/obj02
>>>    rgw thread pool size = 128
>>>    rgw print continue = True
>>>    debug rgw = 0
>>>    rgw enable ops log = False
>>>    log to stderr = False
>>>    rgw enable usage log = False
>>>    rgw gc max objs = 7877
>>>    rgw gc obj min wait = 300
>>>    rgw gc processor period = 600
>>>    rgw init timeout = 180
>>>    rgw gc processor max time = 600
>>>
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com