Re: Recover Data from Deleted RBD Volume

Brad Hubbard <bhubbard@xxxxxxxxxx> · Wed, 10 Aug 2016 12:27:03 +1000

On Tue, Aug 9, 2016 at 7:39 AM, George Mihaiescu <lmihaiescu@xxxxxxxxx> wrote:
> Look in the cinder db, the volumes table to find the Uuid of the deleted volume.

You could also look through the logs at the time of the delete and I
suspect you should
be able to see how the rbd image was prefixed/named at the time of the delete.

HTH,
Brad

>
> If you go through yours OSDs and look for the directories for PG index 20, you might find some fragments from the deleted volume, but it's a long shot...
>
>> On Aug 8, 2016, at 4:39 PM, Georgios Dimitrakakis <giorgis@xxxxxxxxxxxx> wrote:
>>
>> Dear David (and all),
>>
>> the data are considered very critical therefore all this attempt to recover them.
>>
>> Although the cluster hasn't been fully stopped all users actions have. I mean services are running but users are not able to read/write/delete.
>>
>> The deleted image was the exact same size of the example (500GB) but it wasn't the only one deleted today. Our user was trying to do a "massive" cleanup by deleting 11 volumes and unfortunately one of them was very important.
>>
>> Let's assume that I "dd" all the drives what further actions should I do to recover the files? Could you please elaborate a bit more on the phrase "If you've never deleted any other rbd images and assuming you can recover data with names, you may be able to find the rbd objects"??
>>
>> Do you mean that if I know the file names I can go through and check for them? How?
>> Do I have to know *all* file names or by searching for a few of them I can find all data that exist?
>>
>> Thanks a lot for taking the time to answer my questions!
>>
>> All the best,
>>
>> G.
>>
>>> I dont think theres a way of getting the prefix from the cluster at
>>> this point.
>>>
>>> If the deleted image was a similar size to the example youve given,
>>> you will likely have had objects on every OSD. If this data is
>>> absolutely critical you need to stop your cluster immediately or make
>>> copies of all the drives with something like dd. If youve never
>>> deleted any other rbd images and assuming you can recover data with
>>> names, you may be able to find the rbd objects.
>>>
>>> On Mon, Aug 8, 2016 at 7:28 PM, Georgios Dimitrakakis  wrote:
>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 08.08.2016 10:50, Georgios Dimitrakakis wrote:
>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>> On 08.08.2016 09:58, Georgios Dimitrakakis wrote:
>>>>>>>>>
>>>>>>>>> Dear all,
>>>>>>>>>
>>>>>>>>> I would like your help with an emergency issue but first
>>>>>>>>> let me describe our environment.
>>>>>>>>>
>>>>>>>>> Our environment consists of 2OSD nodes with 10x 2TB HDDs
>>>>>>>>> each and 3MON nodes (2 of them are the OSD nodes as well)
>>>>>>>>> all with ceph version 0.80.9
>>>>>>>>> (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
>>>>>>>>>
>>>>>>>>> This environment provides RBD volumes to an OpenStack
>>>>>>>>> Icehouse installation.
>>>>>>>>>
>>>>>>>>> Although not a state of the art environment is working
>>>>>>>>> well and within our expectations.
>>>>>>>>>
>>>>>>>>> The issue now is that one of our users accidentally
>>>>>>>>> deleted one of the volumes without keeping its data first!
>>>>>>>>>
>>>>>>>>> Is there any way (since the data are considered critical
>>>>>>>>> and very important) to recover them from CEPH?
>>>>>>>>
>>>>>>>> Short answer: no
>>>>>>>>
>>>>>>>> Long answer: no, but....
>>>>>>>>
>>>>>>>> Consider the way Ceph stores data... each RBD is striped
>>>>>>>> into chunks
>>>>>>>> (RADOS objects with 4MB size by default); the chunks are
>>>>>>>> distributed
>>>>>>>> among the OSDs with the configured number of replicates
>>>>>>>> (probably two
>>>>>>>> in your case since you use 2 OSD hosts). RBD uses thin
>>>>>>>> provisioning,
>>>>>>>> so chunks are allocated upon first write access.
>>>>>>>> If an RBD is deleted all of its chunks are deleted on the
>>>>>>>> corresponding OSDs. If you want to recover a deleted RBD,
>>>>>>>> you need to
>>>>>>>> recover all individual chunks. Whether this is possible
>>>>>>>> depends on
>>>>>>>> your filesystem and whether the space of a former chunk is
>>>>>>>> already
>>>>>>>> assigned to other RADOS objects. The RADOS object names are
>>>>>>>> composed
>>>>>>>> of the RBD name and the offset position of the chunk, so if
>>>>>>>> an
>>>>>>>> undelete mechanism exists for the OSDs filesystem, you have
>>>>>>>> to be
>>>>>>>> able to recover file by their filename, otherwise you might
>>>>>>>> end up
>>>>>>>> mixing the content of various deleted RBDs. Due to the thin
>>>>>>>> provisioning there might be some chunks missing (e.g. never
>>>>>>>> allocated
>>>>>>>> before).
>>>>>>>>
>>>>>>>> Given the fact that
>>>>>>>> - you probably use XFS on the OSDs since it is the
>>>>>>>> preferred
>>>>>>>> filesystem for OSDs (there is RDR-XFS, but Ive never had to
>>>>>>>> use it)
>>>>>>>> - you would need to stop the complete ceph cluster
>>>>>>>> (recovery tools do
>>>>>>>> not work on mounted filesystems)
>>>>>>>> - your cluster has been in use after the RBD was deleted
>>>>>>>> and thus
>>>>>>>> parts of its former space might already have been
>>>>>>>> overwritten
>>>>>>>> (replication might help you here, since there are two OSDs
>>>>>>>> to try)
>>>>>>>> - XFS undelete does not work well on fragmented files (and
>>>>>>>> OSDs tend
>>>>>>>> to introduce fragmentation...)
>>>>>>>>
>>>>>>>> the answer is no, since it might not be feasible and the
>>>>>>>> chance of
>>>>>>>> success are way too low.
>>>>>>>>
>>>>>>>> If you want to spend time on it I would propose the stop
>>>>>>>> the ceph
>>>>>>>> cluster as soon as possible, create copies of all involved
>>>>>>>> OSDs, start
>>>>>>>> the cluster again and attempt the recovery on the copies.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Burkhard
>>>>>>>
>>>>>>> Hi! Thanks for the info...I understand that this is a very
>>>>>>> difficult and probably not feasible task but in case I need to
>>>>>>> try a recovery what other info should I need? Can I somehow
>>>>>>> find out on which OSDs the specific data were stored and
>>>>>>> minimize my search there?
>>>>>>> Any ideas on how should I proceed?
>>>>>> First of all you need to know the exact object names for the
>>>>>> RADOS
>>>>>> objects. As mentioned before, the name is composed of the RBD
>>>>>> name and
>>>>>> an offset.
>>>>>>
>>>>>> In case of OpenStack, there are three different patterns for
>>>>>> RBD names:
>>>>>>
>>>>>> , e.g. 50f2a0bd-15b1-4dbb-8d1f-fc43ce535f13
>>>>>> for glance images,
>>>>>> , e.g. 9aec1f45-9053-461e-b176-c65c25a48794_disk for nova
>>>>>> images
>>>>>> , e.g. volume-0ca52f58-7e75-4b21-8b0f-39cbcd431c42 for
>>>>>> cinder volumes
>>>>>>
>>>>>> (not considering snapshots etc, which might use different
>>>>>> patterns)
>>>>>>
>>>>>> The RBD chunks are created using a certain prefix (using
>>>>>> examples
>>>>>> from our openstack setup):
>>>>>>
>>>>>> # rbd -p os-images info 8fa3d9eb-91ed-4c60-9550-a62f34aed014
>>>>>> rbd image 8fa3d9eb-91ed-4c60-9550-a62f34aed014:
>>>>>>     size 446 MB in 56 objects
>>>>>>     order 23 (8192 kB objects)
>>>>>>     block_name_prefix: rbd_data.30e57d54dea573
>>>>>>     format: 2
>>>>>>     features: layering, striping
>>>>>>     flags:
>>>>>>     stripe unit: 8192 kB
>>>>>>     stripe count: 1
>>>>>>
>>>>>> # rados -p os-images ls | grep rbd_data.30e57d54dea573
>>>>>> rbd_data.30e57d54dea573.0000000000000015
>>>>>> rbd_data.30e57d54dea573.0000000000000008
>>>>>> rbd_data.30e57d54dea573.000000000000000a
>>>>>> rbd_data.30e57d54dea573.000000000000002d
>>>>>> rbd_data.30e57d54dea573.0000000000000032
>>>>>>
>>>>>> I dont know how whether the prefix is derived from some other
>>>>>> information, but the recover the RBD you definitely need it.
>>>>>>
>>>>>> _If_ you are able to recover the prefix, you can use ceph osd
>>>>>> map
>>>>>> to find the OSDs for each chunk:
>>>>>>
>>>>>> # ceph osd map os-images
>>>>>> rbd_data.30e57d54dea573.000000000000001a
>>>>>> osdmap e418590 pool os-images (38) object
>>>>>> rbd_data.30e57d54dea573.000000000000001a -> pg 38.d5d81d65
>>>>>> (38.65)
>>>>>> -> up ([45,17,108], p45) acting ([45,17,108], p45)
>>>>>>
>>>>>> With 20 OSDs in your case you will likely have to process all
>>>>>> of them
>>>>>> if the RBD has a size of several GBs.
>>>>>>
>>>>>> Regards,
>>>>>> Burkhard
>>>>>
>>>>> Is it possible to get the prefix if the RBD has been deleted
>>>>> already?? Is this info somewhere stored? Can I retrieve it with
>>>>> another way besides "rbd info"? Because when I try to get it
>>>>> using the
>>>>> "rbd info" command unfortunately I am getting the following
>>>>> error:
>>>>>
>>>>> "librbd::ImageCtx: error finding header: (2) No such file or
>>>>> directory"
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> Best regards,
>>>>>
>>>>> G.
>>>>
>>>> Here are some more info from the cluster:
>>>>
>>>> $ ceph df
>>>> GLOBAL:
>>>>     SIZE       AVAIL      RAW USED     %RAW USED
>>>>     74373G     72011G        2362G          3.18
>>>> POOLS:
>>>>     NAME                   ID     USED
>>>> %USED     MAX AVAIL     OBJECTS
>>>>     data                   3          0
>>>>  0        35849G          0
>>>>     metadata               4       1884
>>>>  0        35849G         20
>>>>     rbd                    5          0
>>>>  0        35849G          0
>>>>     .rgw                   6       1374
>>>>  0        35849G          8
>>>>     .rgw.control           7          0
>>>>  0        35849G          8
>>>>     .rgw.gc                8          0
>>>>  0        35849G         32
>>>>     .log                   9          0
>>>>  0        35849G          0
>>>>     .intent-log            10         0
>>>>  0        35849G          0
>>>>     .usage                 11         0
>>>>  0        35849G          3
>>>>     .users                 12        33
>>>>  0        35849G          3
>>>>     .users.email           13        22
>>>>  0        35849G          2
>>>>     .users.swift           14        22
>>>>  0        35849G          2
>>>>     .users.uid             15       985
>>>>  0        35849G          4
>>>>     .rgw.root              16       840
>>>>  0        35849G          3
>>>>     .rgw.buckets.index     17         0         0
>>>>       35849G          4
>>>>     .rgw.buckets           18      170G      0.23
>>>>       35849G      810128
>>>>     .rgw.buckets.extra     19         0         0
>>>>       35849G          1
>>>>     volumes                20     1004G      1.35
>>>>       35849G      262613
>>>>
>>>> Obviously the RBD volumes provided to OpenStack are stored on the
>>>> "volumes" pool , so trying to
>>>> figure out the prefix for the volume in question
>>>> "volume-a490aa0c-6957-4ea2-bb5b-e4054d3765ad" produces the
>>>> following:
>>>>
>>>> $ rbd -p volumes info volume-a490aa0c-6957-4ea2-bb5b-e4054d3765ad
>>>> rbd: error opening image
>>>> volume-a490aa0c-6957-4ea2-bb5b-e4054d3765ad: (2) No such file or
>>>> directory
>>>> 2016-08-09 03:04:56.250977 7fa9ba1ca760 -1 librbd::ImageCtx: error
>>>> finding header: (2) No such file or directory
>>>>
>>>> On the other hand for a volume that already exists and is working
>>>> normally since I get the following:
>>>>
>>>> $ rbd -p volumes info volume-2383fc3a-2b6f-49b4-a3f5-f840569edb73
>>>> rbd image volume-2383fc3a-2b6f-49b4-a3f5-f840569edb73:
>>>>         size 500 GB in 128000 objects
>>>>         order 22 (4096 kB objects)
>>>>         block_name_prefix: rbd_data.fb1bb3136c3ec
>>>>         format: 2
>>>>         features: layering
>>>>
>>>> and can also get the OSD mapping etc.
>>>>
>>>> Does that mean that there is no way to find out on which OSDs the
>>>> deleted volume was placed?
>>>> If thats the case then its not possible to recover the data...Am I
>>>> right???
>>>>
>>>> Any other ideas people???
>>>>
>>>> Looking forward for your comments...please...
>>>>
>>>> Best regards,
>>>>
>>>> G.
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx [1]
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com