Re: Recover unfound objects from crashed OSD's underlying filesystem

Kostis Fardelas <dante1234@xxxxxxxxx> · Thu, 18 Feb 2016 12:09:03 +0200

Can it be any OSD or one of them that the PG reports to have probed?
Do you know if there is a way to force probing for a PG besides
restarting an OSD? It doesn't also need to be an empty OSD I guess.

I also suppose that trying to manually copy the objects is not going to work:
a. either by just using "cp" and trying to keep all the xattrs in
place, something is going to get missed on leveldb or another layer
b. or by extracting the "data" from the objects (don't know if it is
possible though and how) and use rados put to create the object with
the extracted datafile

Kostis

On 18 February 2016 at 03:47, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> You probably don't want to try and replace the dead OSD with a new one until
> stuff is otherwise recovered. Just import the PG into any osd in the cluster
> and it should serve the data up for proper recovery (and then delete it when
> done).
>
> I've never done this or worked on the tooling though so that's bailout the
> extent of my knowledge.
> -Greg
>
>
> On Wednesday, February 17, 2016, Kostis Fardelas <dante1234@xxxxxxxxx>
> wrote:
>>
>> Right now the PG is served by two other OSDs and fresh data is written
>> to them. Is it safe to export the stale pg contents from the crashed
>> OSD and try to just import them again back to the cluster (the PG is
>> not entirely lost, only some objects didn't make it).
>>
>> What could be the right sequence of commands in that case?
>> a. ceph-objectstore-tool --op export --pgid 3.5a9 --data-path
>> /var/lib/ceph/osd/ceph-xx/ --journal-path
>> /var/lib/ceph/osd/ceph-xx/journal --file 3.5a9.export
>> b. rm the crashed OSD, remove from crushmap and create a fresh new
>> with the same ID
>> c. ceph-objectstore-tool --op import --data-path
>> /var/lib/ceph/osd/ceph-xx/ --journal-path
>> /var/lib/ceph/osd/ceph-xx/journal --file 3.5a9..export
>> d. start the osd
>>
>> Regards,
>> Kostis
>>
>>
>> On 18 February 2016 at 02:54, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>> > On Wed, Feb 17, 2016 at 4:44 PM, Kostis Fardelas <dante1234@xxxxxxxxx>
>> > wrote:
>> >> Thanks Greg,
>> >> I gather from reading about ceph_objectstore_tool that it acts at the
>> >> level of the PG. The fact is that I do not want to wipe the whole PG,
>> >> only export certain objects (the unfound ones) and import them again
>> >> into the cluster. To be precise the pg with the unfound objects is
>> >> mapped like this:
>> >> osdmap e257960 pg 3.5a9 (3.5a9) -> up [86,30] acting [86]
>> >>
>> >> but by searching in the underlying filesystem of the crahed OSD, I can
>> >> verify that it contains the 4MB unfound objects which I get with pg
>> >> list_missing and cannot be found on every other probed OSD.
>> >>
>> >> Do you know if and how could I achieve this with ceph_objectstore_tool?
>> >
>> > You can't just pull out single objects. What you can do is export the
>> > entire PG containing the objects, import it into a random OSD, and
>> > then let the cluster recover from that OSD.
>> > (Assuming all the data you need is there — just because you can see
>> > the files on disk doesn't mean all the separate metadata is
>> > available.)
>> > -Greg
>> >
>> >>
>> >> Regards,
>> >> Kostis
>> >>
>> >>
>> >> On 18 February 2016 at 01:22, Gregory Farnum <gfarnum@xxxxxxxxxx>
>> >> wrote:
>> >>> On Wed, Feb 17, 2016 at 3:05 PM, Kostis Fardelas <dante1234@xxxxxxxxx>
>> >>> wrote:
>> >>>> Hello cephers,
>> >>>> due to an unfortunate sequence of events (disk crashes, network
>> >>>> problems), we are currently in a situation with one PG that reports
>> >>>> unfound objects. There is also an OSD which cannot start-up and
>> >>>> crashes with the following:
>> >>>>
>> >>>> 2016-02-17 18:40:01.919546 7fecb0692700 -1 os/FileStore.cc: In
>> >>>> function 'virtual int FileStore::read(coll_t, const ghobject_t&,
>> >>>> uint64_t, size_t, ceph::bufferlist&, bool)
>> >>>> ' thread 7fecb0692700 time 2016-02-17 18:40:01.889980
>> >>>> os/FileStore.cc: 2650: FAILED assert(allow_eio ||
>> >>>> !m_filestore_fail_eio || got != -5)
>> >>>>
>> >>>> (There is probably a problem with the OSD's underlying disk storage)
>> >>>>
>> >>>> By querying the PG that is stuck in
>> >>>> active+recovering+degraded+remapped state due to the unfound objects,
>> >>>> I understand that all possible OSDs are probed except for the crashed
>> >>>> one:
>> >>>>
>> >>>> "might_have_unfound": [
>> >>>>   { "osd": "30",
>> >>>>    "status": "already probed"},
>> >>>>   { "osd": "102",
>> >>>>    "status": "already probed"},
>> >>>>   { "osd": "104",
>> >>>>    "status": "osd is down"},
>> >>>>   { "osd": "105",
>> >>>>    "status": "already probed"},
>> >>>>   { "osd": "145",
>> >>>>     "status": "already probed"}],
>> >>>>
>> >>>> so I understand that the crashed OSD may have the latest version of
>> >>>> the objects. I can also verify that I I can find the 4MB objects in
>> >>>> the underlying filesystem of the crashed OSD.
>> >>>>
>> >>>> By issuing ceph pg 3.5a9 list_missing, I get for all unfound objects,
>> >>>> information like this:
>> >>>>
>> >>>>         { "oid": { "oid":
>> >>>> "829d5be29cd6e231e7e951ba58ad3d0baf7fba65aad40083cef39bb03d5ec0fd",
>> >>>>               "key": "",
>> >>>>               "snapid": -2,
>> >>>>               "hash": 3880052137,
>> >>>>               "max": 0,
>> >>>>               "pool": 3,
>> >>>>               "namespace": ""},
>> >>>>           "need": "255658'37078125",
>> >>>>           "have": "255651'37077081",
>> >>>>           "locations": []}
>> >>>>
>> >>>>
>> >>>> My question is what is the best solution that I should follow?
>> >>>> a. Is there any way to export the objects from the crashed OSD's
>> >>>> filesystem and reimport it to the cluster? How could that be done?
>> >>>
>> >>> Look at ceph_objecstore_tool. eg,
>> >>>
>> >>> http://ceph-users.ceph.narkive.com/lwDkR2fZ/recovering-incomplete-pgs-with-ceph-objectstore-tool
>> >>>
>> >>>> b. If I issue ceph pg {pg-id} mark_unfound_lost revert, should I
>> >>>> expect that the "have" version of this object (thus an older version
>> >>>> of the object) will become enabled?
>> >>>
>> >>> It should, although I gather this sometimes takes some contortions for
>> >>> reasons I've never worked out.
>> >>> -Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com