Re: FW: crashed+peering PGs

Christian Brunner <chb@xxxxxx> · Mon, 18 Jul 2011 09:51:24 +0200

>> >> $ ceph pg dump -o - | grep crashed
>> >> pg_stat objects mip     degr    unf     kb      bytes   log
>> >> disklog state   v       reported        up      acting  last_scrub
>> >> 1.1ac   0       0       0       0       0       0       0       0
>> >>  crashed+peering 0'0     5869'576        [3,13]  [3,13]  0'0
>> >> 2011-07-13 17:04:30.221618
>> >> 0.1ad   0       0       0       0       0       0       198     198
>> >>  crashed+peering 3067'1194       5869'515        [3,13]  [3,13]
>> >> 3067'1194       2011-07-13 17:04:29.221726
>> >> 2.1ab   0       0       0       0       0       0       0       0
>> >>  crashed+peering 0'0     5869'576        [3,13]  [3,13]  0'0
>> >> 2011-07-13 17:04:31.222145
>> >> 1.6c    0       0       0       0       0       0       0       0
>> >>  crashed+peering 0'0     5869'577        [3,13]  [3,13]  0'0
>> >> 2011-07-13 17:05:35.237286
>> >> 0.6d    0       0       0       0       0       0       198     198
>> >>  crashed+peering 3067'636        5869'516        [3,13]  [3,13]
>> >> 3067'636        2011-07-13 17:05:34.237024
>> >> 2.6b    0       0       0       0       0       0       0       0
>> >>  crashed+peering 0'0     5869'577        [3,13]  [3,13]  0'0
>> >> 2011-07-13 17:05:37.238474
>
> Strange, none of these PGs show up in those logs.  Can you do
>
> ceph pg map 1.1ac
>
> for each PG and see where the current CRUSH map thinks they should be
> stored?  That would be the node to look for them on.  You may also want to
> look for $osd_data/current/$pgid_head on all the OSDs to see where the
> copies are.
>
> The location in the pg dump (from the monitors PGMap) is just the last
> reported location.  Primaries for each PG normally send stats updates
> several times a minute for each PG that is touched (and less frequently
> for those that are not).  So it's not necessarily bad that it doesn't
> match... but it is strange that no surviving copy is reporting updated
> information.

pg dump matches the data from pg map:

2011-07-18 09:41:02.340371 mon <- [pg,map,1.1ac]
2011-07-18 09:41:02.410063 mon0 -> 'osdmap e6517 pg 1.1ac (1.1ac) ->
up [3,13] acting [3,13]' (0)
2011-07-18 09:41:02.434859 mon <- [pg,map,0.1ad]
2011-07-18 09:41:02.435546 mon1 -> 'osdmap e6517 pg 0.1ad (0.1ad) ->
up [3,13] acting [3,13]' (0)
2011-07-18 09:41:02.442316 mon <- [pg,map,2.1ab]
2011-07-18 09:41:02.442839 mon1 -> 'osdmap e6517 pg 2.1ab (2.1ab) ->
up [3,13] acting [3,13]' (0)
2011-07-18 09:41:02.449131 mon <- [pg,map,1.6c]
2011-07-18 09:41:02.449679 mon2 -> 'osdmap e6517 pg 1.6c (1.6c) -> up
[3,13] acting [3,13]' (0)
2011-07-18 09:41:02.455090 mon <- [pg,map,0.6d]
2011-07-18 09:41:02.455429 mon0 -> 'osdmap e6517 pg 0.6d (0.6d) -> up
[3,13] acting [3,13]' (0)
2011-07-18 09:41:02.461530 mon <- [pg,map,2.6b]
2011-07-18 09:41:02.462012 mon2 -> 'osdmap e6517 pg 2.6b (2.6b) -> up
[3,13] acting [3,13]' (0)

I've also looked at the filesystem: $pgid_head directories do neither
exist on osd003 nor on osd013.

I supect that the PGs are empty because they belong to a pool that we
don't use, as we use ceph for rbd exclusive.

Regards,
Christian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html