Re: Deleting an rbd image hangs

David Turner <drakonstein@xxxxxxxxx> · Wed, 09 May 2018 12:50:08 +0000

Yeah, I was about to suggest looking up all currently existing rbd IDs and snapshot IDs, compare to rados ls and remove the objects that exist for rbds and snapshots not reported by the cluster.

On Wed, May 9, 2018, 8:38 AM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
On Tue, May 8, 2018 at 2:31 PM,  <ceph@xxxxxxxxxx> wrote:

> Hello Jason,

>

>

> Am 8. Mai 2018 15:30:34 MESZ schrieb Jason Dillaman <jdillama@xxxxxxxxxx>:

>>Perhaps the image had associated snapshots? Deleting the object

>>doesn't delete the associated snapshots so those objects will remain

>>until the snapshot is removed. However, if you have removed the RBD

>>header, the snapshot id is now gone.

>>

>

> Hmm... that makes me curious...

>

> So when i have a vm-image (rbd) on ceph and am doing  One or more Snapshots from this Image.... i *must have* to delete the snapshot(s) at First completely before i delete the origin Image?

Yup, the rbd CLI (and related librbd API helpers for removing images)

will not let you delete an image that has snapshots for this very

reason.

> How can we then get rid of this orphaned objects when we accidentaly have deleted the origin Image First?

Unfortunately, there isn't any available CLI tooling to let if you

delete a "self-managed" snapshot even if you could determine the

correct snapshot ids that are no longer in-use. If you are comfortable

building a custom C/C++ librados application to clean up your pool,

you could first generate a list of all known in-use snapshot IDs by

collecting all the "snapshot_XYZ" keys on all "rbd_header.ABC" objects

in the pool (where XYZ is the snapshot id in hex and ABC is each

image's unique id) and cross-referencing them w/ the pool's

"removed_snaps" output from "ceph osd pool ls detail".

> Thanks if you have a Bit of Time to clarify me/us :)

>

> - Mehmet

>

>>On Tue, May 8, 2018 at 12:29 AM, Eugen Block <eblock@xxxxxx> wrote:

>>> Hi,

>>>

>>> I have a similar issue and would also need some advice how to get rid

>>of the

>>> already deleted files.

>>>

>>> Ceph is our OpenStack backend and there was a nova clone without

>>parent

>>> information. Apparently, the base image had been deleted without a

>>warning

>>> or anything although there were existing clones.

>>> Anyway, I tried to delete the respective rbd_data and _header files

>>as

>>> described in [1]. There were about 700 objects to be deleted, but 255

>>> objects remained according to the 'rados -p pool ls' command. The

>>attempt to

>>> delete the rest (again) resulted (and still results) in "No such file

>>or

>>> directory". After about half an hour later one more object vanished

>>> (rbd_header file), there are now still 254 objects left in the pool.

>>First I

>>> thought maybe Ceph will cleanup itself, it just takes some time, but

>>this

>>> was weeks ago and the number of objects has not changed since then.

>>>

>>> I would really appreciate any help.

>>>

>>> Regards,

>>> Eugen

>>>

>>>

>>> Zitat von Jan Marquardt <jm@xxxxxxxxxxx>:

>>>

>>>

>>>> Am 30.04.18 um 09:26 schrieb Jan Marquardt:

>>>>>

>>>>> Am 27.04.18 um 20:48 schrieb David Turner:

>>>>>>

>>>>>> This old [1] blog post about removing super large RBDs is not

>>relevant

>>>>>> if you're using object map on the RBDs, however it's method to

>>manually

>>>>>> delete an RBD is still valid.  You can see if this works for you

>>to

>>>>>> manually remove the problem RBD you're having.

>>>>>

>>>>>

>>>>> I followed the instructions, but it seems that 'rados -p rbd ls |

>>grep

>>>>> '^rbd_data.221bf2eb141f2.' | xargs -n 200  rados -p rbd rm' gets

>>stuck,

>>>>> too. It's running since Friday and still not finished. The rbd

>>image

>>>>> is/was about 1 TB large.

>>>>>

>>>>> Until now the only output was:

>>>>> error removing rbd>rbd_data.221bf2eb141f2.00000000000051d2: (2) No

>>such

>>>>> file or directory

>>>>> error removing rbd>rbd_data.221bf2eb141f2.000000000000e3f2: (2) No

>>such

>>>>> file or directory

>>>>

>>>>

>>>> I am still trying to get rid of this. 'rados -p rbd ls' still shows

>>a

>>>> lot of objects beginning with rbd_data.221bf2eb141f2, but if I try

>>to

>>>> delete them with 'rados -p rbd rm <obj>' it says 'No such file or

>>>> directory'. This is not the behaviour I'd expect. Any ideas?

>>>>

>>>> Besides this rbd_data.221bf2eb141f2.0000000000016379 is still

>>causing

>>>> the OSDs crashing, which leaves the cluster unusable for us at the

>>>> moment. Even if it's just a proof of concept, I'd like to get this

>>fixed

>>>> without destroying the whole cluster.

>>>>

>>>>>>

>>>>>> [1]

>>http://cephnotes.ksperis.com/blog/2014/07/04/remove-big-rbd-image

>>>>>>

>>>>>> On Thu, Apr 26, 2018 at 9:25 AM Jan Marquardt <jm@xxxxxxxxxxx

>>>>>> <mailto:jm@xxxxxxxxxxx>> wrote:

>>>>>>

>>>>>>     Hi,

>>>>>>

>>>>>>     I am currently trying to delete an rbd image which is

>>seemingly

>>>>>> causing

>>>>>>     our OSDs to crash, but it always gets stuck at 3%.

>>>>>>

>>>>>>     root@ceph4:~# rbd rm noc_tobedeleted

>>>>>>     Removing image: 3% complete...

>>>>>>

>>>>>>     Is there any way to force the deletion? Any other advices?

>>>>>>

>>>>>>     Best Regards

>>>>>>

>>>>>>     Jan

>>>>

>>>>

>>>>

>>>>

>>>> _______________________________________________

>>>> ceph-users mailing list

>>>> ceph-users@xxxxxxxxxxxxxx

>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>

>>>

>>>

>>>

>>> _______________________________________________

>>> ceph-users mailing list

>>> ceph-users@xxxxxxxxxxxxxx

>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

Jason

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com