Re: Recover Data from Deleted RBD Volume

Georgios Dimitrakakis <giorgis@xxxxxxxxxxxx> · Mon, 08 Aug 2016 23:39:05 +0300

Dear David (and all),

the data are considered very critical therefore all this attempt to 
recover them.

Although the cluster hasn't been fully stopped all users actions have. 
I mean services are running but users are not able to read/write/delete.

The deleted image was the exact same size of the example (500GB) but it 
wasn't the only one deleted today. Our user was trying to do a "massive" 
cleanup by deleting 11 volumes and unfortunately one of them was very 
important.

Let's assume that I "dd" all the drives what further actions should I 
do to recover the files? Could you please elaborate a bit more on the 
phrase "If you've never deleted any other rbd images and assuming you 
can recover data with names, you may be able to find the rbd objects"??

Do you mean that if I know the file names I can go through and check 
for them? How?
Do I have to know *all* file names or by searching for a few of them I 
can find all data that exist?

Thanks a lot for taking the time to answer my questions!

All the best,

G.

I dont think theres a way of getting the prefix from the cluster at
this point.

If the deleted image was a similar size to the example youve given,
you will likely have had objects on every OSD. If this data is
absolutely critical you need to stop your cluster immediately or make
copies of all the drives with something like dd. If youve never
deleted any other rbd images and assuming you can recover data with
names, you may be able to find the rbd objects.

On Mon, Aug 8, 2016 at 7:28 PM, Georgios Dimitrakakis  wrote:

Hi,

On 08.08.2016 10:50, Georgios Dimitrakakis wrote:

Hi,

On 08.08.2016 09:58, Georgios Dimitrakakis wrote:

Dear all,

I would like your help with an emergency issue but first
let me describe our environment.

Our environment consists of 2OSD nodes with 10x 2TB HDDs
each and 3MON nodes (2 of them are the OSD nodes as well)
all with ceph version 0.80.9
(b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)

This environment provides RBD volumes to an OpenStack
Icehouse installation.

Although not a state of the art environment is working
well and within our expectations.

The issue now is that one of our users accidentally
deleted one of the volumes without keeping its data first!

Is there any way (since the data are considered critical
and very important) to recover them from CEPH?

Short answer: no

Long answer: no, but....

Consider the way Ceph stores data... each RBD is striped
into chunks
(RADOS objects with 4MB size by default); the chunks are
distributed
among the OSDs with the configured number of replicates
(probably two
in your case since you use 2 OSD hosts). RBD uses thin
provisioning,
so chunks are allocated upon first write access.
If an RBD is deleted all of its chunks are deleted on the
corresponding OSDs. If you want to recover a deleted RBD,
you need to
recover all individual chunks. Whether this is possible
depends on
your filesystem and whether the space of a former chunk is
already
assigned to other RADOS objects. The RADOS object names are
composed
of the RBD name and the offset position of the chunk, so if
an
undelete mechanism exists for the OSDs filesystem, you have
to be
able to recover file by their filename, otherwise you might
end up
mixing the content of various deleted RBDs. Due to the thin
provisioning there might be some chunks missing (e.g. never
allocated
before).

Given the fact that
- you probably use XFS on the OSDs since it is the
preferred
filesystem for OSDs (there is RDR-XFS, but Ive never had to
use it)
- you would need to stop the complete ceph cluster
(recovery tools do
not work on mounted filesystems)
- your cluster has been in use after the RBD was deleted
and thus
parts of its former space might already have been
overwritten
(replication might help you here, since there are two OSDs
to try)
- XFS undelete does not work well on fragmented files (and
OSDs tend
to introduce fragmentation...)

the answer is no, since it might not be feasible and the
chance of
success are way too low.

If you want to spend time on it I would propose the stop
the ceph
cluster as soon as possible, create copies of all involved
OSDs, start
the cluster again and attempt the recovery on the copies.

Regards,
Burkhard

Hi! Thanks for the info...I understand that this is a very
difficult and probably not feasible task but in case I need to
try a recovery what other info should I need? Can I somehow
find out on which OSDs the specific data were stored and
minimize my search there?
Any ideas on how should I proceed?
First of all you need to know the exact object names for the
RADOS
objects. As mentioned before, the name is composed of the RBD
name and
an offset.

In case of OpenStack, there are three different patterns for
RBD names:

, e.g. 50f2a0bd-15b1-4dbb-8d1f-fc43ce535f13
for glance images,
, e.g. 9aec1f45-9053-461e-b176-c65c25a48794_disk for nova
images
, e.g. volume-0ca52f58-7e75-4b21-8b0f-39cbcd431c42 for
cinder volumes

(not considering snapshots etc, which might use different
patterns)

The RBD chunks are created using a certain prefix (using
examples
from our openstack setup):

# rbd -p os-images info 8fa3d9eb-91ed-4c60-9550-a62f34aed014
rbd image 8fa3d9eb-91ed-4c60-9550-a62f34aed014:
    size 446 MB in 56 objects
    order 23 (8192 kB objects)
    block_name_prefix: rbd_data.30e57d54dea573
    format: 2
    features: layering, striping
    flags:
    stripe unit: 8192 kB
    stripe count: 1

# rados -p os-images ls | grep rbd_data.30e57d54dea573
rbd_data.30e57d54dea573.0000000000000015
rbd_data.30e57d54dea573.0000000000000008
rbd_data.30e57d54dea573.000000000000000a
rbd_data.30e57d54dea573.000000000000002d
rbd_data.30e57d54dea573.0000000000000032

I dont know how whether the prefix is derived from some other
information, but the recover the RBD you definitely need it.

_If_ you are able to recover the prefix, you can use ceph osd
map
to find the OSDs for each chunk:

# ceph osd map os-images
rbd_data.30e57d54dea573.000000000000001a
osdmap e418590 pool os-images (38) object
rbd_data.30e57d54dea573.000000000000001a -> pg 38.d5d81d65
(38.65)
-> up ([45,17,108], p45) acting ([45,17,108], p45)

With 20 OSDs in your case you will likely have to process all
of them
if the RBD has a size of several GBs.

Regards,
Burkhard

Is it possible to get the prefix if the RBD has been deleted
already?? Is this info somewhere stored? Can I retrieve it with
another way besides "rbd info"? Because when I try to get it
using the
"rbd info" command unfortunately I am getting the following
error:

"librbd::ImageCtx: error finding header: (2) No such file or
directory"

Any ideas?

Best regards,

G.

Here are some more info from the cluster:

$ ceph df
GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    74373G     72011G        2362G          3.18
POOLS:
    NAME                   ID     USED     
%USED     MAX AVAIL     OBJECTS
    data                   3          0       
 0        35849G          0
    metadata               4       1884       
 0        35849G         20
    rbd                    5          0       
 0        35849G          0
    .rgw                   6       1374       
 0        35849G          8
    .rgw.control           7          0       
 0        35849G          8
    .rgw.gc                8          0       
 0        35849G         32
    .log                   9          0       
 0        35849G          0
    .intent-log            10         0       
 0        35849G          0
    .usage                 11         0       
 0        35849G          3
    .users                 12        33       
 0        35849G          3
    .users.email           13        22       
 0        35849G          2
    .users.swift           14        22       
 0        35849G          2
    .users.uid             15       985       
 0        35849G          4
    .rgw.root              16       840       
 0        35849G          3
    .rgw.buckets.index     17         0         0 
      35849G          4
    .rgw.buckets           18      170G      0.23 
      35849G      810128
    .rgw.buckets.extra     19         0         0 
      35849G          1
    volumes                20     1004G      1.35 
      35849G      262613

Obviously the RBD volumes provided to OpenStack are stored on the
"volumes" pool , so trying to
figure out the prefix for the volume in question
"volume-a490aa0c-6957-4ea2-bb5b-e4054d3765ad" produces the
following:

$ rbd -p volumes info volume-a490aa0c-6957-4ea2-bb5b-e4054d3765ad
rbd: error opening image
volume-a490aa0c-6957-4ea2-bb5b-e4054d3765ad: (2) No such file or
directory
2016-08-09 03:04:56.250977 7fa9ba1ca760 -1 librbd::ImageCtx: error
finding header: (2) No such file or directory

On the other hand for a volume that already exists and is working
normally since I get the following:

$ rbd -p volumes info volume-2383fc3a-2b6f-49b4-a3f5-f840569edb73
rbd image volume-2383fc3a-2b6f-49b4-a3f5-f840569edb73:
        size 500 GB in 128000 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.fb1bb3136c3ec
        format: 2
        features: layering

and can also get the OSD mapping etc.

Does that mean that there is no way to find out on which OSDs the
deleted volume was placed?
If thats the case then its not possible to recover the data...Am I
right???

Any other ideas people???

Looking forward for your comments...please...

Best regards,

G.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com