Re: Recover unfound objects from crashed OSD's underlying filesystem

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 17 Feb 2016 17:47:34 -0800

You probably don't want to try and replace the dead OSD with a new one until stuff is otherwise recovered. Just import the PG into any osd in the cluster and it should serve the data up for proper recovery (and then delete it when done).
I've never done this or worked on the tooling though so that's bailout the extent of my knowledge.
-Greg

On Wednesday, February 17, 2016, Kostis Fardelas <dante1234@xxxxxxxxx> wrote:
Right now the PG is served by two other OSDs and fresh data is written

to them. Is it safe to export the stale pg contents from the crashed

OSD and try to just import them again back to the cluster (the PG is

not entirely lost, only some objects didn't make it).

What could be the right sequence of commands in that case?

a. ceph-objectstore-tool --op export --pgid 3.5a9 --data-path

/var/lib/ceph/osd/ceph-xx/ --journal-path

/var/lib/ceph/osd/ceph-xx/journal --file 3.5a9.export

b. rm the crashed OSD, remove from crushmap and create a fresh new

with the same ID

c. ceph-objectstore-tool --op import --data-path

/var/lib/ceph/osd/ceph-xx/ --journal-path

/var/lib/ceph/osd/ceph-xx/journal --file 3.5a9..export

d. start the osd

Regards,

Kostis

On 18 February 2016 at 02:54, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:

> On Wed, Feb 17, 2016 at 4:44 PM, Kostis Fardelas <dante1234@xxxxxxxxx> wrote:

>> Thanks Greg,

>> I gather from reading about ceph_objectstore_tool that it acts at the

>> level of the PG. The fact is that I do not want to wipe the whole PG,

>> only export certain objects (the unfound ones) and import them again

>> into the cluster. To be precise the pg with the unfound objects is

>> mapped like this:

>> osdmap e257960 pg 3.5a9 (3.5a9) -> up [86,30] acting [86]

>>

>> but by searching in the underlying filesystem of the crahed OSD, I can

>> verify that it contains the 4MB unfound objects which I get with pg

>> list_missing and cannot be found on every other probed OSD.

>>

>> Do you know if and how could I achieve this with ceph_objectstore_tool?

>

> You can't just pull out single objects. What you can do is export the

> entire PG containing the objects, import it into a random OSD, and

> then let the cluster recover from that OSD.

> (Assuming all the data you need is there — just because you can see

> the files on disk doesn't mean all the separate metadata is

> available.)

> -Greg

>

>>

>> Regards,

>> Kostis

>>

>>

>> On 18 February 2016 at 01:22, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:

>>> On Wed, Feb 17, 2016 at 3:05 PM, Kostis Fardelas <dante1234@xxxxxxxxx> wrote:

>>>> Hello cephers,

>>>> due to an unfortunate sequence of events (disk crashes, network

>>>> problems), we are currently in a situation with one PG that reports

>>>> unfound objects. There is also an OSD which cannot start-up and

>>>> crashes with the following:

>>>>

>>>> 2016-02-17 18:40:01.919546 7fecb0692700 -1 os/FileStore.cc: In

>>>> function 'virtual int FileStore::read(coll_t, const ghobject_t&,

>>>> uint64_t, size_t, ceph::bufferlist&, bool)

>>>> ' thread 7fecb0692700 time 2016-02-17 18:40:01.889980

>>>> os/FileStore.cc: 2650: FAILED assert(allow_eio ||

>>>> !m_filestore_fail_eio || got != -5)

>>>>

>>>> (There is probably a problem with the OSD's underlying disk storage)

>>>>

>>>> By querying the PG that is stuck in

>>>> active+recovering+degraded+remapped state due to the unfound objects,

>>>> I understand that all possible OSDs are probed except for the crashed

>>>> one:

>>>>

>>>> "might_have_unfound": [

>>>>   { "osd": "30",

>>>>    "status": "already probed"},

>>>>   { "osd": "102",

>>>>    "status": "already probed"},

>>>>   { "osd": "104",

>>>>    "status": "osd is down"},

>>>>   { "osd": "105",

>>>>    "status": "already probed"},

>>>>   { "osd": "145",

>>>>     "status": "already probed"}],

>>>>

>>>> so I understand that the crashed OSD may have the latest version of

>>>> the objects. I can also verify that I I can find the 4MB objects in

>>>> the underlying filesystem of the crashed OSD.

>>>>

>>>> By issuing ceph pg 3.5a9 list_missing, I get for all unfound objects,

>>>> information like this:

>>>>

>>>>         { "oid": { "oid":

>>>> "829d5be29cd6e231e7e951ba58ad3d0baf7fba65aad40083cef39bb03d5ec0fd",

>>>>               "key": "",

>>>>               "snapid": -2,

>>>>               "hash": 3880052137,

>>>>               "max": 0,

>>>>               "pool": 3,

>>>>               "namespace": ""},

>>>>           "need": "255658'37078125",

>>>>           "have": "255651'37077081",

>>>>           "locations": []}

>>>>

>>>>

>>>> My question is what is the best solution that I should follow?

>>>> a. Is there any way to export the objects from the crashed OSD's

>>>> filesystem and reimport it to the cluster? How could that be done?

>>>

>>> Look at ceph_objecstore_tool. eg,

>>> http://ceph-users.ceph.narkive.com/lwDkR2fZ/recovering-incomplete-pgs-with-ceph-objectstore-tool

>>>

>>>> b. If I issue ceph pg {pg-id} mark_unfound_lost revert, should I

>>>> expect that the "have" version of this object (thus an older version

>>>> of the object) will become enabled?

>>>

>>> It should, although I gather this sometimes takes some contortions for

>>> reasons I've never worked out.

>>> -Greg

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com