On 2014-12-02 08:39, Yehuda Sadeh wrote:
On Sat, Nov 29, 2014 at 2:26 PM, Ben <b@benjackson.email> wrote:
On 29/11/14 11:40, Yehuda Sadeh wrote:
On Fri, Nov 28, 2014 at 1:38 PM, Ben <b@benjackson.email>
wrote:
On 29/11/14 01:50, Yehuda Sadeh wrote:
On Thu, Nov 27, 2014 at 9:22 PM, Ben <b@benjackson.email>
wrote:
On 2014-11-28 15:42, Yehuda Sadeh wrote:
On Thu, Nov 27, 2014 at 2:15 PM, b <b@benjackson.email>
wrote:
On 2014-11-27 11:36, Yehuda Sadeh wrote:
On Wed, Nov 26, 2014 at 3:49 PM, b <b@benjackson.email>
wrote:
On 2014-11-27 10:21, Yehuda Sadeh wrote:
On Wed, Nov 26, 2014 at 3:09 PM, b <b@benjackson.email>
wrote:
On 2014-11-27 09:38, Yehuda Sadeh wrote:
On Wed, Nov 26, 2014 at 2:32 PM, b
<b@benjackson.email>
wrote:
I've been deleting a bucket which originally had
60TB of
data
in
it,
with
our cluster doing only 1 replication, the total
usage was
120TB.
I've been deleting the objects slowly using S3
browser,
and
I
can
see
the
bucket usage is now down to around 2.5TB or 5TB with
duplication,
but
the
usage in the cluster has not changed.
I've looked at garbage collection (radosgw-admin gc
list
--include
all)
and
it just reports square brackets "[]"
I've run radosgw-admin temp remove
--date=2014-11-20, and
it
doesn't
appear
to have any effect.
Is there a way to check where this space is being
consumed?
Running 'ceph df' the USED space in the buckets pool
is
not
showing
any
of
the 57TB that should have been freed up from the
deletion
so
far.
Running 'radosgw-admin bucket stats | jshon | grep
size_kb_actual'
and
adding up all the buckets usage, this shows that the
space
has
been
freed
from the bucket, but the cluster is all sorts of
messed
up.
ANY IDEAS? What can I look at?
Can you run 'radosgw-admin gc list --include-all'?
Yehuda
I've done it before, and it just returns square
brackets []
(see
below)
radosgw-admin gc list --include-all
[]
Do you know which of the rados pools have all that
extra
data?
Try
to
list that pool's objects, verify that there are no
surprises
there
(e.g., use 'rados -p <pool> ls').
Yehuda
I'm just running that command now, and its taking some
time.
There
is
a
large number of objects.
Once it has finished, what should I be looking for?
I assume the pool in question is the one that holds your
objects
data?
You should be looking for objects that are not expected
to
exist
anymore, and objects of buckets that don't exist anymore.
The
problem
here is to identify these.
I suggest starting by looking at all the existing
buckets,
compose
a
list of all the bucket prefixes for the existing buckets,
and
try
to
look whether there are objects that have different
prefixes.
Yehuda
Any ideas? I've found the prefix, the number of objects in
the
pool
that
match that prefix numbers in the 21 millions
The actual 'radosgw-admin bucket stats' command reports it
as
only
having
1.2 million.
Well, the objects you're seeing are raw objects, and since
rgw
stripes
the data, it is expected to have more raw objects than
objects
in
the
bucket. Still, it seems that you have much too many of
these.
You
can
try to check whether there are pending multipart uploads
that
were
never completed using the S3 api.
At the moment there's no easy way to figure out which raw
objects
are
not supposed to exist. The process would be like this:
1. rados ls -p <data pool>
keep the list sorted
2. list objects in the bucket
3. for each object in (2), do: radosgw-admin object stat
--bucket=<bucket> --object=<object>
--rgw-cache-enabled=false
(disabling the cache so that it goes quicker)
4. look at the result of (3), and generate a list of all
the
parts.
5. sort result of (4), compare it to (1)
Note that if you're running firefly or later, the raw
objects
are
not
specified explicitly in the command you run at (3), so you
might
need
a different procedure, e.g., find out the raw objects
random
string
that is being used, remove it from the list generated in 1,
etc.)
That's basically it.
I'll be interested to figure out what happened, why the
garbage
collection didn't work correctly. You could try verifying
that
it's
working by:
- create an object (let's say ~10MB in size).
- radosgw-admin object stat --bucket=<bucket>
--object=<object>
(keep this info, see
- remove the object
- run radosgw-admin gc list --include-all and verify
that the
raw
parts are listed there
- wait a few hours, repeat last step, see that the parts
don't
appear
there anymore
- run rados -p <pool> ls, check to see if the raw
objects
still
exist
Yehuda
Not sure where to go from here, and our cluster is slowly
filling
up
while
not clearing any space.
I did the last section:
I'll be interested to figure out what happened, why the
garbage
collection didn't work correctly. You could try verifying
that
it's
working by:
- create an object (let's say ~10MB in size).
- radosgw-admin object stat --bucket=<bucket>
--object=<object>
(keep this info, see
- remove the object
- run radosgw-admin gc list --include-all and verify
that the
raw
parts are listed there
- wait a few hours, repeat last step, see that the parts
don't
appear
there anymore
- run rados -p <pool> ls, check to see if the raw
objects
still
exist
I added the file, did a stat and it displayed the json
output
I removed the object and then tried to stat the object, this
time
it
failed
to stat the object
After this, I ran the gc list include all command and it
displayed
nothing
but the square brackets []
Was the object larger than 512k? Also, did you do it within
the
300
seconds after removing the object?
There should exist a garbage collection pool (by default
.rgw.gc,
but
it can be something different if you configured your zone
differently), can you verify that you have it, and if so,
what
does
it
contain?
Yehuda
Yes, the object was 10M. As soon as I had deleted it from the
bucket,
I
ran
the command to check garbage collection.
There is a .rgw.gc pool, we haven't changed it from default.
It
contains
a
number of objects ~7800, but the size of the files is 0kb
They're expected to be 0kb, the data only resides in their
omap, and
that's not reflected in the objects size. You could run 'rados
listomapkeys' on these.
Maybe garbage collection isn't working properly..
our gc settings are the following, we have 2 object gateways
in
our
cluster
too client.radosgw.obj01 and client.radosgw.obj02 (from
ceph.conf)
[client.radosgw.obj01]
rgw dns name = ceph.###.###
host = obj01
keyring = /etc/ceph/keyring.radosgw.obj01
rgw socket path = /tmp/radosgw.sock
log file = /var/log/ceph/radosgw.log
rgw data = /var/lib/ceph/radosgw/obj01
rgw thread pool size = 128
rgw print continue = True
debug rgw = 0
rgw enable ops log = False
log to stderr = False
rgw enable usage log = False
rgw gc max objs = 7877
You should put this line (rgw gx max objs) in the global
section of
your ceph.conf. Either that, or run your radosgw-admin command
with
'-n client.radosgw.obj02'. That might change some of the
results
you're seeing (radosgw-admin gc list --include-all, etc.).
Yehuda
rgw gc obj min wait = 300
rgw gc processor period = 600
rgw init timeout = 180
rgw gc processor max time = 600
[client.radosgw.obj02]
rgw dns name = ceph.###.###
host = obj02
keyring = /etc/ceph/keyring.radosgw.obj02
rgw socket path = /tmp/radosgw.sock
log file = /var/log/ceph/radosgw.log
rgw data = /var/lib/ceph/radosgw/obj02
rgw thread pool size = 128
rgw print continue = True
debug rgw = 0
rgw enable ops log = False
log to stderr = False
rgw enable usage log = False
rgw gc max objs = 7877
rgw gc obj min wait = 300
rgw gc processor period = 600
rgw init timeout = 180
rgw gc processor max time = 600
I've finally deleted the entire bucket. All 60TB cleared from
the
bucket,
the bucket no longer exists.
Yet running rados ls -p .rgw.buckets | grep '4804.14' still
lists all
the
_shadow_ files that have that buckets prefix.
Any ideas why these aren't being deleted/cleared up by garbage
collection?
Are there any errors in the log? Can you provide a log (debug rgw
=
20, debug ms = 1) of the radosgw through the garbage collection
stage?
Yehuda
Hi Yehuda,
Here is a snapshot of the log.
I created a new bucket, uploaded a 25mb file and then deleted it
i did a gc list --include all and could see the shadow files
I then forced a gc process, and the log has gone crazy with this
stuff
in
it, and it is still going.
2014-12-02 09:07:59.634544 7f86256f3700 1 -- 10.150.2.2:0/1009990
-->
10.150.2.13:6816/17803 -- ping v1 -- ?+0 0x7f8558000d20 con
0x1e9c470
2014-12-02 09:07:59.634578 7f86256f3700 1 -- 10.150.2.2:0/1009990
-->
10.150.2.12:6830/3830 -- ping v1 -- ?+0 0x7f8558001120 con
0x1e93090
2014-12-02 09:07:59.634592 7f86256f3700 1 -- 10.150.2.2:0/1009990
-->
10.150.2.14:6818/3517 -- ping v1 -- ?+0 0x7f8558001700 con
0x1e9d230
2014-12-02 09:07:59.634607 7f86256f3700 1 -- 10.150.2.2:0/1009990
-->
10.150.2.10:6806/3397 -- ping v1 -- ?+0 0x7f8558006050 con
0x1e9e0e0
2014-12-02 09:07:59.634621 7f86256f3700 1 -- 10.150.2.2:0/1009990
-->
10.150.2.19:6830/4000 -- ping v1 -- ?+0 0x7f8558000a10 con
0x1e9fdc0
2014-12-02 09:07:59.634634 7f86256f3700 1 -- 10.150.2.2:0/1009990
-->
10.150.2.11:6809/3451 -- ping v1 -- ?+0 0x7f8558001d50 con
0x1e9ef10
2014-12-02 09:07:59.634647 7f86256f3700 1 -- 10.150.2.2:0/1009990
-->
10.150.2.19:6803/3449 -- ping v1 -- ?+0 0x7f8558001fe0 con
0x1ea1aa0
2014-12-02 09:07:59.634660 7f86256f3700 1 -- 10.150.2.2:0/1009990
-->
10.150.2.17:6827/14001 -- ping v1 -- ?+0 0x7f85580028c0 con
0x1ea0bf0
2014-12-02 09:07:59.637762 7f86277f8700 1 -- 10.150.2.2:0/1009990
<==
osd.108 10.150.2.18:6827/13867 158 ==== osd_op_reply(14761 gc.1283
[call]
v0'0 uv98856 ondisk = 0) v6 ==== 174+0+11 (1634423764 0
1993775135)
0x7f8570006050 con 0x1f1e4d0
2014-12-02 09:07:59.637937 7f8630ba57c0 1 -- 10.150.2.2:0/1009990
-->
10.150.2.18:6827/13867 -- osd_op(client.6359480.0:14762 gc.1283
[call
lock.unlock] 6.10bb9719 ondisk+write e19074) v4 -- ?+0 0x1f30a80
con
0x1f1e4d0
2014-12-02 09:07:59.652166 7f86277f8700 1 -- 10.150.2.2:0/1009990
<==
osd.108 10.150.2.18:6827/13867 159 ==== osd_op_reply(14762 gc.1283
[call]
v19074'98857 uv98857 ondisk = 0) v6 ==== 174+0+0 (1358419719 0 0)
0x7f8570006050 con 0x1f1e4d0
2014-12-02 09:07:59.652293 7f8630ba57c0 1 -- 10.150.2.2:0/1009990
-->
10.150.2.15:6805/1032 -- osd_op(client.6359480.0:14763 gc.1284
[call
lock.lock] 6.4a5f5aa9 ondisk+write e19074) v4 -- ?+0 0x1f30a80 con
0x1f290a0
2014-12-02 09:07:59.670108 7f86277f8700 1 -- 10.150.2.2:0/1009990
<==
osd.53 10.150.2.15:6805/1032 130 ==== osd_op_reply(14763 gc.1284
[call]
v19074'100398 uv100398 ondisk = 0) v6 ==== 174+0+0 (3807569510 0
0)
0x7f85b0000b60 con 0x1f290a0
2014-12-02 09:07:59.670239 7f8630ba57c0 1 -- 10.150.2.2:0/1009990
-->
10.150.2.15:6805/1032 -- osd_op(client.6359480.0:14764 gc.1284
[call
rgw.gc_list] 6.4a5f5aa9 ack+read e19074) v4 -- ?+0 0x1f30a80 con
0x1f290a0
2014-12-02 09:07:59.673871 7f86277f8700 1 -- 10.150.2.2:0/1009990
<==
osd.53 10.150.2.15:6805/1032 131 ==== osd_op_reply(14764 gc.1284
[call]
v0'0
uv100398 ondisk = 0) v6 ==== 174+0+11 (4101371043 0 1993775135)
0x7f85b0000b60 con 0x1f290a0
2014-12-02 09:07:59.674042 7f8630ba57c0 1 -- 10.150.2.2:0/1009990
-->
10.150.2.15:6805/1032 -- osd_op(client.6359480.0:14765 gc.1284
[call
lock.unlock] 6.4a5f5aa9 ondisk+write e19074) v4 -- ?+0 0x1f30a80
con
0x1f290a0
2014-12-02 09:07:59.686255 7f86277f8700 1 -- 10.150.2.2:0/1009990
<==
osd.53 10.150.2.15:6805/1032 132 ==== osd_op_reply(14765 gc.1284
[call]
v19074'100399 uv100399 ondisk = 0) v6 ==== 174+0+0 (2220917153 0
0)
0x7f85b0000b60 con 0x1f290a0
2014-12-02 09:07:59.686376 7f8630ba57c0 1 -- 10.150.2.2:0/1009990
-->
10.150.2.19:6812/3767 -- osd_op(client.6359480.0:14766 gc.1285
[call
lock.lock] 6.9f912e07 ondisk+write e19074) v4 -- ?+0 0x1f1f560 con
0x1f0ea90
2014-12-02 09:07:59.721432 7f86277f8700 1 -- 10.150.2.2:0/1009990
<==
osd.83 10.150.2.19:6812/3767 163 ==== osd_op_reply(14766 gc.1285
[call]
v19074'302537 uv302537 ondisk = 0) v6 ==== 174+0+0 (2463619648 0
0)
0x7f85b8000900 con 0x1f0ea90
2014-12-02 09:07:59.721591 7f8630ba57c0 1 -- 10.150.2.2:0/1009990
-->
10.150.2.19:6812/3767 -- osd_op(client.6359480.0:14767 gc.1285
[call
rgw.gc_list] 6.9f912e07 ack+read e19074) v4 -- ?+0 0x1f30a80 con
0x1f0ea90
2014-12-02 09:07:59.725699 7f86277f8700 1 -- 10.150.2.2:0/1009990
<==
osd.83 10.150.2.19:6812/3767 164 ==== osd_op_reply(14767 gc.1285
[call]
v0'0
uv302537 ondisk = 0) v6 ==== 174+0+11 (3924946656 0 1993775135)
0x7f85b8000900 con 0x1f0ea90
2014-12-02 09:07:59.725868 7f8630ba57c0 1 -- 10.150.2.2:0/1009990
-->
10.150.2.19:6812/3767 -- osd_op(client.6359480.0:14768 gc.1285
[call
lock.unlock] 6.9f912e07 ondisk+write e19074) v4 -- ?+0 0x1f1f560
con
0x1f0ea90
2014-12-02 09:07:59.738287 7f86277f8700 1 -- 10.150.2.2:0/1009990
<==
osd.83 10.150.2.19:6812/3767 165 ==== osd_op_reply(14768 gc.1285
[call]
v19074'302538 uv302538 ondisk = 0) v6 ==== 174+0+0 (314638285 0 0)
0x7f85b8000900 con 0x1f0ea90
2014-12-02 09:07:59.738415 7f8630ba57c0 1 -- 10.150.2.2:0/1009990
-->
10.150.2.10:6803/3331 -- osd_op(client.6359480.0:14769 gc.1286
[call
lock.lock] 6.61beb660 ondisk+write e19074) v4 -- ?+0 0x1f30a80 con
0x1f1ce50
2014-12-02 09:07:59.764523 7f86277f8700 1 -- 10.150.2.2:0/1009990
<==
osd.1
10.150.2.10:6803/3331 145 ==== osd_op_reply(14769 gc.1286 [call]
v19074'136677 uv136677 ondisk = 0) v6 ==== 174+0+0 (3559159328 0
0)
0x7f8540002dd0 con 0x1f1ce50
2014-12-02 09:07:59.764670 7f8630ba57c0 1 -- 10.150.2.2:0/1009990
-->
10.150.2.10:6803/3331 -- osd_op(client.6359480.0:14770 gc.1286
[call
rgw.gc_list] 6.61beb660 ack+read e19074) v4 -- ?+0 0x1f1f560 con
0x1f1ce50
2014-12-02 09:07:59.767836 7f86277f8700 1 -- 10.150.2.2:0/1009990
<==
osd.1
10.150.2.10:6803/3331 146 ==== osd_op_reply(14770 gc.1286 [call]
v0'0
uv136677 ondisk = 0) v6 ==== 174+0+11 (1532209036 0 1993775135)
0x7f8540002dd0 con 0x1f1ce50
2014-12-02 09:07:59.768009 7f8630ba57c0 1 -- 10.150.2.2:0/1009990
-->
10.150.2.10:6803/3331 -- osd_op(client.6359480.0:14771 gc.1286
[call
lock.unlock] 6.61beb660 ondisk+write e19074) v4 -- ?+0 0x1f30a80
con
0x1f1ce50
2014-12-02 09:07:59.780792 7f86277f8700 1 -- 10.150.2.2:0/1009990
<==
osd.1
10.150.2.10:6803/3331 147 ==== osd_op_reply(14771 gc.1286 [call]
v19074'136678 uv136678 ondisk = 0) v6 ==== 174+0+0 (1412649901 0
0)
0x7f8540002dd0 con 0x1f1ce50
2014-12-02 09:07:59.780968 7f8630ba57c0 1 -- 10.150.2.2:0/1009990
-->
10.150.2.12:6833/4016 -- osd_op(client.6359480.0:14772 gc.1287
[call
lock.lock] 6.9bfdc387 ondisk+write e19074) v4 -- ?+0 0x1f1f560 con
0x1f3b2f0
2014-12-02 09:07:59.804154 7f86277f8700 1 -- 10.150.2.2:0/1009990
<==
osd.65 10.150.2.12:6833/4016 118 ==== osd_op_reply(14772 gc.1287
[call]
v19074'279063 uv279063 ondisk = 0) v6 ==== 174+0+0 (4117862336 0
0)
0x7f85d80024b0 con 0x1f3b2f0
2014-12-02 09:07:59.804279 7f8630ba57c0 1 -- 10.150.2.2:0/1009990
-->
10.150.2.12:6833/4016 -- osd_op(client.6359480.0:14773 gc.1287
[call
rgw.gc_list] 6.9bfdc387 ack+read e19074) v4 -- ?+0 0x1f30a80 con
0x1f3b2f0