Re: FW: crashed+peering PGs

Christian Brunner <chb@xxxxxx> · Tue, 19 Jul 2011 17:41:50 +0200

2011/7/18 Sage Weil <sage@xxxxxxxxxxxx>:
> On Mon, 18 Jul 2011, Christian Brunner wrote:
>> >> >> $ ceph pg dump -o - | grep crashed
>> >> >> pg_stat objects mip     degr    unf     kb      bytes   log
>> >> >> disklog state   v       reported        up      acting  last_scrub
>> >> >> 1.1ac   0       0       0       0       0       0       0       0
>> >> >>  crashed+peering 0'0     5869'576        [3,13]  [3,13]  0'0
>> >> >> 2011-07-13 17:04:30.221618
>> >> >> 0.1ad   0       0       0       0       0       0       198     198
>> >> >>  crashed+peering 3067'1194       5869'515        [3,13]  [3,13]
>> >> >> 3067'1194       2011-07-13 17:04:29.221726
>> >> >> 2.1ab   0       0       0       0       0       0       0       0
>> >> >>  crashed+peering 0'0     5869'576        [3,13]  [3,13]  0'0
>> >> >> 2011-07-13 17:04:31.222145
>> >> >> 1.6c    0       0       0       0       0       0       0       0
>> >> >>  crashed+peering 0'0     5869'577        [3,13]  [3,13]  0'0
>> >> >> 2011-07-13 17:05:35.237286
>> >> >> 0.6d    0       0       0       0       0       0       198     198
>> >> >>  crashed+peering 3067'636        5869'516        [3,13]  [3,13]
>> >> >> 3067'636        2011-07-13 17:05:34.237024
>> >> >> 2.6b    0       0       0       0       0       0       0       0
>> >> >>  crashed+peering 0'0     5869'577        [3,13]  [3,13]  0'0
>> >> >> 2011-07-13 17:05:37.238474
>> >
>> > Strange, none of these PGs show up in those logs.  Can you do
>> >
>> > ceph pg map 1.1ac
>> >
>> > for each PG and see where the current CRUSH map thinks they should be
>> > stored?  That would be the node to look for them on.  You may also want to
>> > look for $osd_data/current/$pgid_head on all the OSDs to see where the
>> > copies are.
>> >
>> > The location in the pg dump (from the monitors PGMap) is just the last
>> > reported location.  Primaries for each PG normally send stats updates
>> > several times a minute for each PG that is touched (and less frequently
>> > for those that are not).  So it's not necessarily bad that it doesn't
>> > match... but it is strange that no surviving copy is reporting updated
>> > information.
>>
>> pg dump matches the data from pg map:
>>
>> 2011-07-18 09:41:02.340371 mon <- [pg,map,1.1ac]
>> 2011-07-18 09:41:02.410063 mon0 -> 'osdmap e6517 pg 1.1ac (1.1ac) ->
>> up [3,13] acting [3,13]' (0)
>> 2011-07-18 09:41:02.434859 mon <- [pg,map,0.1ad]
>> 2011-07-18 09:41:02.435546 mon1 -> 'osdmap e6517 pg 0.1ad (0.1ad) ->
>> up [3,13] acting [3,13]' (0)
>> 2011-07-18 09:41:02.442316 mon <- [pg,map,2.1ab]
>> 2011-07-18 09:41:02.442839 mon1 -> 'osdmap e6517 pg 2.1ab (2.1ab) ->
>> up [3,13] acting [3,13]' (0)
>> 2011-07-18 09:41:02.449131 mon <- [pg,map,1.6c]
>> 2011-07-18 09:41:02.449679 mon2 -> 'osdmap e6517 pg 1.6c (1.6c) -> up
>> [3,13] acting [3,13]' (0)
>> 2011-07-18 09:41:02.455090 mon <- [pg,map,0.6d]
>> 2011-07-18 09:41:02.455429 mon0 -> 'osdmap e6517 pg 0.6d (0.6d) -> up
>> [3,13] acting [3,13]' (0)
>> 2011-07-18 09:41:02.461530 mon <- [pg,map,2.6b]
>> 2011-07-18 09:41:02.462012 mon2 -> 'osdmap e6517 pg 2.6b (2.6b) -> up
>> [3,13] acting [3,13]' (0)
>>
>> I've also looked at the filesystem: $pgid_head directories do neither
>> exist on osd003 nor on osd013.
>
> Does it exist on any other nodes?

No, it doesn't exist on any node.

> Did the osd crash you mentioned happen at the end (when you started seeing
> these 6 pgs misbehave), or did it recover fully after that, and only do
> this after a later OSD was reformatted?

The crash happened during the rebuild. The rebuild finished with these
6 PGs in state "crashed+peering". Everything else was fine (no
degraded objects).

My suspicion is, that these PGs had been skipped in the rebuild from
osd013 to osd003, because of the crash. After that I did a reformat on
osd013, which might explain, why these PGs are missing on osd013, too.

Is there a way to create a PG manually?

Thanks,
Christian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html