Re: help removing an rbd image?

Kevan Rehm <krehm@xxxxxxxx> · Thu, 26 May 2016 17:27:38 +0000

Samuel,

Back again.   I converted my cluster to use 24 filestore OSDs, and ran the
following test three times:

rbd -p ssd_replica create --size 100G image1
rbd --pool ssd_replica bench-write --io-size 2M --io-threads 16 --io-total
100G --io-pattern seq image1
rbd -p ssd_replica rm image1

and in each case the rbd image was removed successfully, rados showed no
leftover objects other than 'rbd_directory' which I could remove with
rados.  (The pool is empty other than for this one image.)  I then
converted the cluster back to all-bluestore OSDs.  For my first run I only
created a 10G-sized image1 object, and that was removed successfully.  I
then repeated the above with 100G, and the problem reappeared, I again
have objects I cannot remove.  (bluestore warnings removed for brevity).

[root@alpha1-p200 ~]# rbd -p ssd_replica ls
image1
[root@alpha1-p200 ~]# rbd -p ssd_replica rm image1
Removing image: 100% complete...done.
[root@alpha1-p200 ~]# rbd -p ssd_replica ls
image1
[root@alpha1-p200 ~]# rados -p ssd_replica ls | wc -l
8582

This is slightly different than last time in that the rbd image 'image1'
still appears in the pool, I don't get the "No such file or directory"
errors anymore for the rbd image.  But I do get those errors for the
leftover objects that make up the image when I try to remove them:

[root@alpha1-p200 ~]# rados -p ssd_replica rm
rbd_data.112d238e1f29.0000000000002d3d
error removing ssd_replica>rbd_data.112d238e1f29.0000000000002d3d: (2) No
such file or directory

I'm not sure where to go from here.   Suggestions?

Kevan

On 5/24/16, 4:11 PM, "Kevan Rehm" <krehm@xxxxxxxx> wrote:

>Okay, will do.   If the problem goes away with filestore, I'll switch back
>to bluestore again and re-duplicate the problem.  In that case, are there
>particular things you would like me to collect?   Or clues I should look
>for in logs?
>
>Thanks, Kevan
>
>On 5/24/16, 4:06 PM, "Samuel Just" <sjust@xxxxxxxxxx> wrote:
>
>>My money is on bluestore.  If you can try to reproduce on filestore,
>>that would rapidly narrow it down.
>>-Sam
>>
>>On Tue, May 24, 2016 at 1:53 PM, Kevan Rehm <krehm@xxxxxxxx> wrote:
>>> Nope, not using tiering.
>>>
>>> Also, this is my second attempt, this is repeatable for me, I'm trying
>>>to
>>> duplicate a previous occurrence of this same problem to collect useful
>>> debug data.  In the previous case, I was eventually able to get rid of
>>>the
>>> objects (but have forgotten how), but that was followed by 22 of the 24
>>> OSDs crashing hard.  Took me quite a while to re-deploy and get it
>>>working
>>> again.  I want to get a small, repeatable example for the Ceph guys to
>>> look at, assuming it's a bug.
>>>
>>> Don't know if it's related to the bluestore OSDs or not, still getting
>>>my
>>> feet wet with Ceph.
>>>
>>> Kevan
>>>
>>> On 5/24/16, 3:47 PM, "Jason Dillaman" <jdillama@xxxxxxxxxx> wrote:
>>>
>>>>Any chance you are using cache tiering?  It's odd that you can see the
>>>>objects through "rados ls" but cannot delete them with "rados rm".
>>>>
>>>>On Tue, May 24, 2016 at 4:34 PM, Kevan Rehm <krehm@xxxxxxxx> wrote:
>>>>> Greetings,
>>>>>
>>>>> I have a small Ceph 10.2.1 test cluster using a 3-replicate pool
>>>>>based
>>>>>on 24
>>>>> SSDs configured with bluestore.  I created and wrote an rbd image
>>>>>called
>>>>> "image1", then deleted the image again.
>>>>>
>>>>> rbd -p ssd_replica create --size 100G image1
>>>>>
>>>>> rbd --pool ssd_replica bench-write --io-size 2M --io-threads 16
>>>>>--io-total
>>>>> 100G --io-pattern seq image1
>>>>>
>>>>> rbd -p ssd_replica rm image1
>>>>>
>>>>>
>>>>> The rbd rm command completed successfully, but not all the
>>>>>image-related
>>>>> files disappeared from the pool.   The pool still contains files:
>>>>>
>>>>>
>>>>> rbd_directory
>>>>>
>>>>> rbd_id.image1
>>>>>
>>>>> rbd_object_map.15ed238e1f29
>>>>>
>>>>>
>>>>> plus 2938 other objects of the form
>>>>> "rbd_data.15ed238e1f29.000000000000xxxx".
>>>>>
>>>>> Attempting to re-delete image1 does not work.  Attempting to directly
>>>>>delete
>>>>> object rbd_id.image1 and one of the data objects doesn't work either.
>>>>>The
>>>>> commands all fail with "No such file or directory" yet they are
>>>>>obviously
>>>>> there.  (Warning messages stripped for brevity.)
>>>>>
>>>>> [root@alpha1-p200 fio]# rbd -p ssd_replica rm image1
>>>>>
>>>>> Removing image: 0% complete...failed.
>>>>>
>>>>> rbd: delete error: (2) No such file or directory
>>>>>
>>>>> [root@alpha1-p200 fio]# rados -p ssd_replica rm rbd_id.image1
>>>>>
>>>>> error removing ssd_replica>rbd_id.image1: (2) No such file or
>>>>>directory
>>>>>
>>>>> [root@alpha1-p200 fio]# rados -p ssd_replica rm
>>>>> rbd_data.15ed238e1f29.0000000000005b56
>>>>>
>>>>> error removing ssd_replica>rbd_data.15ed238e1f29.0000000000005b56:
>>>>>(2)
>>>>>No
>>>>> such file or directory
>>>>>
>>>>>
>>>>> Is this a bug, and if so, where should I look for the cause?   Anyone
>>>>>know
>>>>> how to delete these objects?
>>>>>
>>>>> Thanks, Kevan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>Jason
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com