On Wed, Feb 17, 2016 at 3:05 PM, Kostis Fardelas <dante1234@xxxxxxxxx> wrote: > Hello cephers, > due to an unfortunate sequence of events (disk crashes, network > problems), we are currently in a situation with one PG that reports > unfound objects. There is also an OSD which cannot start-up and > crashes with the following: > > 2016-02-17 18:40:01.919546 7fecb0692700 -1 os/FileStore.cc: In > function 'virtual int FileStore::read(coll_t, const ghobject_t&, > uint64_t, size_t, ceph::bufferlist&, bool) > ' thread 7fecb0692700 time 2016-02-17 18:40:01.889980 > os/FileStore.cc: 2650: FAILED assert(allow_eio || > !m_filestore_fail_eio || got != -5) > > (There is probably a problem with the OSD's underlying disk storage) > > By querying the PG that is stuck in > active+recovering+degraded+remapped state due to the unfound objects, > I understand that all possible OSDs are probed except for the crashed > one: > > "might_have_unfound": [ > { "osd": "30", > "status": "already probed"}, > { "osd": "102", > "status": "already probed"}, > { "osd": "104", > "status": "osd is down"}, > { "osd": "105", > "status": "already probed"}, > { "osd": "145", > "status": "already probed"}], > > so I understand that the crashed OSD may have the latest version of > the objects. I can also verify that I I can find the 4MB objects in > the underlying filesystem of the crashed OSD. > > By issuing ceph pg 3.5a9 list_missing, I get for all unfound objects, > information like this: > > { "oid": { "oid": > "829d5be29cd6e231e7e951ba58ad3d0baf7fba65aad40083cef39bb03d5ec0fd", > "key": "", > "snapid": -2, > "hash": 3880052137, > "max": 0, > "pool": 3, > "namespace": ""}, > "need": "255658'37078125", > "have": "255651'37077081", > "locations": []} > > > My question is what is the best solution that I should follow? > a. Is there any way to export the objects from the crashed OSD's > filesystem and reimport it to the cluster? How could that be done? Look at ceph_objecstore_tool. eg, http://ceph-users.ceph.narkive.com/lwDkR2fZ/recovering-incomplete-pgs-with-ceph-objectstore-tool > b. If I issue ceph pg {pg-id} mark_unfound_lost revert, should I > expect that the "have" version of this object (thus an older version > of the object) will become enabled? It should, although I gather this sometimes takes some contortions for reasons I've never worked out. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com