Re: Troubleshooting Incomplete PGs

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Wed, 22 Oct 2014 16:51:55 -0700

On Wed, Oct 22, 2014 at 3:09 PM, Chris Kitzmiller <ckitzmiller@xxxxxxxxxxxxx> wrote:
On Oct 22, 2014, at 1:50 PM, Craig Lewis wrote:

> Incomplete means "Ceph detects that a placement group is missing a necessary period of history from its log. If you see this state, report a bug, and try to start any failed OSDs that may contain the needed information".

>

> In the PG query, it lists some OSDs that it's trying to probe:

>           "probing_osds": [

>                 "10",

>                 "13",

>                 "15",

>                 "25"],

>           "down_osds_we_would_probe": [],

>

> Is one of those the OSD you replaced?  If so, you might try ceph pg {pg-id} mark_unfound_lost revert|delete.  That command will lose data; it tells Ceph to give up looking for data that it can't find, so you might want to wait a bit.

Yes. osd.10 was the OSD I replaced. :( I suspect that I didn't actually have any writes during this time and that a revert might leave me in an OK place.

Looking at the query more closely I see that all of the peers (except osd.10) have the same value for last_update/last_complete/last_scrub/last_deep_scrub except that the peer entry on osd.10 has 0 values for everything. It's as if all my OSDs are believing in the ghost of this PG on osd.10. I'd like to revert I just want to make sure that I'm going to revert to the sane value and not the 0 value.

I've never (successfully) used mark_unfound_lost, so I can't say exactly what'll happen.  revert should be what you need, but I don't know if it's going to revert to the point in time before whatever hole in the history happened, or if it will just give up on the portions of history that it doesn't have.

> There's also the possibility that your crushmap is still not correct.  In the history, I can see that you had bad CRUSH MAPs in the past.  Stuff like

>   "recovery_state": [

>         { "name": "Started\/Primary\/Peering",

>           "enter_time": "2014-10-21 12:18:48.482666",

>           "past_intervals": [

>                 { "first": 4663,

>                   "last": 4685,

>                   "maybe_went_rw": 1,

>                   "up": [],

>                   "acting": [

>                         10,

>                         25,

>                         10,

>                         -1]},

>

> shows that CRUSH placed some data on osd.10 twice, which is a sign of a bad crushmap.  You might run through the crushtool testing at http://ceph.com/docs/master/man/8/crushtool/, just to make sure everything is kosher.

I checked with `crushtool --test -i crush.bin --show-bad-mappings` which showed me errors for mappings above 6 replicas (which I'd expect with my particular map) but nothing else. Modifying max_size to 6 gives clean output. I'm guessing that this was some hiccup related to the OSD dying.

Also, what's up with osd.-1?

You got me on that one.  I've never seen that in any of my clusters.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com