Re: FW: crashed+peering PGs

Sage Weil <sage@xxxxxxxxxxxx> · Mon, 18 Jul 2011 10:17:52 -0700 (PDT)

On Mon, 18 Jul 2011, Christian Brunner wrote:
> >> >> $ ceph pg dump -o - | grep crashed
> >> >> pg_stat objects mip     degr    unf     kb      bytes   log
> >> >> disklog state   v       reported        up      acting  last_scrub
> >> >> 1.1ac   0       0       0       0       0       0       0       0
> >> >>  crashed+peering 0'0     5869'576        [3,13]  [3,13]  0'0
> >> >> 2011-07-13 17:04:30.221618
> >> >> 0.1ad   0       0       0       0       0       0       198     198
> >> >>  crashed+peering 3067'1194       5869'515        [3,13]  [3,13]
> >> >> 3067'1194       2011-07-13 17:04:29.221726
> >> >> 2.1ab   0       0       0       0       0       0       0       0
> >> >>  crashed+peering 0'0     5869'576        [3,13]  [3,13]  0'0
> >> >> 2011-07-13 17:04:31.222145
> >> >> 1.6c    0       0       0       0       0       0       0       0
> >> >>  crashed+peering 0'0     5869'577        [3,13]  [3,13]  0'0
> >> >> 2011-07-13 17:05:35.237286
> >> >> 0.6d    0       0       0       0       0       0       198     198
> >> >>  crashed+peering 3067'636        5869'516        [3,13]  [3,13]
> >> >> 3067'636        2011-07-13 17:05:34.237024
> >> >> 2.6b    0       0       0       0       0       0       0       0
> >> >>  crashed+peering 0'0     5869'577        [3,13]  [3,13]  0'0
> >> >> 2011-07-13 17:05:37.238474
> >
> > Strange, none of these PGs show up in those logs.  Can you do
> >
> > ceph pg map 1.1ac
> >
> > for each PG and see where the current CRUSH map thinks they should be
> > stored?  That would be the node to look for them on.  You may also want to
> > look for $osd_data/current/$pgid_head on all the OSDs to see where the
> > copies are.
> >
> > The location in the pg dump (from the monitors PGMap) is just the last
> > reported location.  Primaries for each PG normally send stats updates
> > several times a minute for each PG that is touched (and less frequently
> > for those that are not).  So it's not necessarily bad that it doesn't
> > match... but it is strange that no surviving copy is reporting updated
> > information.
> 
> pg dump matches the data from pg map:
> 
> 2011-07-18 09:41:02.340371 mon <- [pg,map,1.1ac]
> 2011-07-18 09:41:02.410063 mon0 -> 'osdmap e6517 pg 1.1ac (1.1ac) ->
> up [3,13] acting [3,13]' (0)
> 2011-07-18 09:41:02.434859 mon <- [pg,map,0.1ad]
> 2011-07-18 09:41:02.435546 mon1 -> 'osdmap e6517 pg 0.1ad (0.1ad) ->
> up [3,13] acting [3,13]' (0)
> 2011-07-18 09:41:02.442316 mon <- [pg,map,2.1ab]
> 2011-07-18 09:41:02.442839 mon1 -> 'osdmap e6517 pg 2.1ab (2.1ab) ->
> up [3,13] acting [3,13]' (0)
> 2011-07-18 09:41:02.449131 mon <- [pg,map,1.6c]
> 2011-07-18 09:41:02.449679 mon2 -> 'osdmap e6517 pg 1.6c (1.6c) -> up
> [3,13] acting [3,13]' (0)
> 2011-07-18 09:41:02.455090 mon <- [pg,map,0.6d]
> 2011-07-18 09:41:02.455429 mon0 -> 'osdmap e6517 pg 0.6d (0.6d) -> up
> [3,13] acting [3,13]' (0)
> 2011-07-18 09:41:02.461530 mon <- [pg,map,2.6b]
> 2011-07-18 09:41:02.462012 mon2 -> 'osdmap e6517 pg 2.6b (2.6b) -> up
> [3,13] acting [3,13]' (0)
> 
> I've also looked at the filesystem: $pgid_head directories do neither
> exist on osd003 nor on osd013.

Does it exist on any other nodes?

Did the osd crash you mentioned happen at the end (when you started seeing 
these 6 pgs misbehave), or did it recover fully after that, and only do 
this after a later OSD was reformatted?

sage

> 
> I supect that the PGs are empty because they belong to a pool that we
> don't use, as we use ceph for rbd exclusive.
> 
> Regards,
> Christian
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>