Re: Stuck PGs blocked_by non-existent OSDs

Samuel Just <sjust@xxxxxxxxxx> · Mon, 09 Mar 2015 07:28:38 -0700



You'll probably have to recreate osds with the same ids (empty ones),
let them boot, stop them, and mark them lost.  There is a feature in the
tracker to improve this behavior: http://tracker.ceph.com/issues/10976
-Sam

On Mon, 2015-03-09 at 12:24 +0000, joel.merrick@xxxxxxxxx wrote:
> Hi,
> 
> I'm trying to fix an issue within 0.93 on our internal cloud related
> to incomplete pg's (yes, I realise the folly of having the dev release
> - it's a not-so-test env now, so I need to recover this really). I'll
> detail the current outage info;
> 
> 72 initial (now 65) OSDs
> 6 nodes
> 
> * Update to 0.92 from Giant.
> * Fine for a day
> * MDS outage overnight and subsequent node failure
> * Massive increase in RAM utilisation (10G per OSD!)
> * More failure
> * OSD's 'out' to try to alleviate new large cluster requirements and a
> couple died under additional load
> * 'superfluous and faulty' OSD's rm, auth keys deleted
> * RAM added to nodes (96GB each - serving 10-12 OSDs)
> * Ugrade to 0.93
> * Fix broken journals due to 0.92 update
> * No more missing objects or degredation
> 
> So, that brings me to today, I still have 73/2264 PGs listed as stuck
> incomplete/inactive. I also have requests that are blocked.
> 
> Upon querying said placement groups, I notice that they are
> 'blocked_by' non-existent OSDs (ones I have removed due to issues).
> I have no way to tell them the OSD is lost (as it'a already been
> removed, both from osdmap and crushmap).
> Exporting the crushmap shows non-existant OSDs as deviceN (i.e.
> device36 for the removed osd.36)
> Deleting those and reimporting crush map makes no affect
> 
> Some further pg detail - https://gist.github.com/joelio/cecca9b48aca6d44451b
> 
> 
> So I'm stuck, I can't recover the pg's as I can't remove a
> non-existent OSD that the PG think's blocking it.
> 
> Help graciously accepted!
> Joel
> 


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com