Re: FW: crashed+peering PGs

Christian Brunner <chb@xxxxxx> · Fri, 22 Jul 2011 11:27:55 +0200

Hi Samuel,

I've now applied your fix and it worked. Everything is active+clean again.

Thanks,
Christian

2011/7/20 Samuel Just <samuelj@xxxxxxxxxxxxxxx>:
> I've now pushed a patch to master implementing a hack to allow a pg to be
> forced into creating.
>
> ceph pg force_create_pg <pg>
>
> will now reset that pg's status to creating.  This should cause the osd to
> recreate the pg.
>
> Let us know if it helps!
>
> 019955a1f40944ed489fd613abe7ef4f3671fb4b implements the hack.
>
> -Sam
>
> On 07/19/2011 03:21 PM, Samuel Just wrote:
>>
>> Sage may correct me on this, but:
>>
>> The trouble is probably that an osd is informed about its responsibilities
>> towards a pg by the osds which previously handled that pg.  If all of those
>> have been wiped/removed, the new osd won't be notified.  The monitor can
>> send an MOSDPGCreate message, but currently won't since we don't want to
>> loose data in case the osds with that pg come back.  I don't think we have a
>> way to address this at the moment.  You might be able to fix it by removing
>> the pools and recreating them if they don't have any data.  If you want to
>> wait for a bit, I'm going to add a command to the monitor to allow you  to
>> re-mark these as creating.
>>
>> -Sam
>>
>> On 07/19/2011 08:41 AM, Christian Brunner wrote:
>>>
>>> 2011/7/18 Sage Weil<sage@xxxxxxxxxxxx>:
>>>>
>>>> On Mon, 18 Jul 2011, Christian Brunner wrote:
>>>>>>>>>
>>>>>>>>> $ ceph pg dump -o - | grep crashed
>>>>>>>>> pg_stat objects mip     degr    unf     kb      bytes   log
>>>>>>>>> disklog state   v       reported        up      acting  last_scrub
>>>>>>>>> 1.1ac   0       0       0       0       0       0       0       0
>>>>>>>>>  crashed+peering 0'0     5869'576        [3,13]  [3,13]  0'0
>>>>>>>>> 2011-07-13 17:04:30.221618
>>>>>>>>> 0.1ad   0       0       0       0       0       0       198     198
>>>>>>>>>  crashed+peering 3067'1194       5869'515        [3,13]  [3,13]
>>>>>>>>> 3067'1194       2011-07-13 17:04:29.221726
>>>>>>>>> 2.1ab   0       0       0       0       0       0       0       0
>>>>>>>>>  crashed+peering 0'0     5869'576        [3,13]  [3,13]  0'0
>>>>>>>>> 2011-07-13 17:04:31.222145
>>>>>>>>> 1.6c    0       0       0       0       0       0       0       0
>>>>>>>>>  crashed+peering 0'0     5869'577        [3,13]  [3,13]  0'0
>>>>>>>>> 2011-07-13 17:05:35.237286
>>>>>>>>> 0.6d    0       0       0       0       0       0       198     198
>>>>>>>>>  crashed+peering 3067'636        5869'516        [3,13]  [3,13]
>>>>>>>>> 3067'636        2011-07-13 17:05:34.237024
>>>>>>>>> 2.6b    0       0       0       0       0       0       0       0
>>>>>>>>>  crashed+peering 0'0     5869'577        [3,13]  [3,13]  0'0
>>>>>>>>> 2011-07-13 17:05:37.238474
>>>>>>
>>>>>> Strange, none of these PGs show up in those logs.  Can you do
>>>>>>
>>>>>> ceph pg map 1.1ac
>>>>>>
>>>>>> for each PG and see where the current CRUSH map thinks they should be
>>>>>> stored?  That would be the node to look for them on.  You may also
>>>>>> want to
>>>>>> look for $osd_data/current/$pgid_head on all the OSDs to see where the
>>>>>> copies are.
>>>>>>
>>>>>> The location in the pg dump (from the monitors PGMap) is just the last
>>>>>> reported location.  Primaries for each PG normally send stats updates
>>>>>> several times a minute for each PG that is touched (and less
>>>>>> frequently
>>>>>> for those that are not).  So it's not necessarily bad that it doesn't
>>>>>> match... but it is strange that no surviving copy is reporting updated
>>>>>> information.
>>>>>
>>>>> pg dump matches the data from pg map:
>>>>>
>>>>> 2011-07-18 09:41:02.340371 mon<- [pg,map,1.1ac]
>>>>> 2011-07-18 09:41:02.410063 mon0 ->  'osdmap e6517 pg 1.1ac (1.1ac) ->
>>>>> up [3,13] acting [3,13]' (0)
>>>>> 2011-07-18 09:41:02.434859 mon<- [pg,map,0.1ad]
>>>>> 2011-07-18 09:41:02.435546 mon1 ->  'osdmap e6517 pg 0.1ad (0.1ad) ->
>>>>> up [3,13] acting [3,13]' (0)
>>>>> 2011-07-18 09:41:02.442316 mon<- [pg,map,2.1ab]
>>>>> 2011-07-18 09:41:02.442839 mon1 ->  'osdmap e6517 pg 2.1ab (2.1ab) ->
>>>>> up [3,13] acting [3,13]' (0)
>>>>> 2011-07-18 09:41:02.449131 mon<- [pg,map,1.6c]
>>>>> 2011-07-18 09:41:02.449679 mon2 ->  'osdmap e6517 pg 1.6c (1.6c) ->  up
>>>>> [3,13] acting [3,13]' (0)
>>>>> 2011-07-18 09:41:02.455090 mon<- [pg,map,0.6d]
>>>>> 2011-07-18 09:41:02.455429 mon0 ->  'osdmap e6517 pg 0.6d (0.6d) ->  up
>>>>> [3,13] acting [3,13]' (0)
>>>>> 2011-07-18 09:41:02.461530 mon<- [pg,map,2.6b]
>>>>> 2011-07-18 09:41:02.462012 mon2 ->  'osdmap e6517 pg 2.6b (2.6b) ->  up
>>>>> [3,13] acting [3,13]' (0)
>>>>>
>>>>> I've also looked at the filesystem: $pgid_head directories do neither
>>>>> exist on osd003 nor on osd013.
>>>>
>>>> Does it exist on any other nodes?
>>>
>>> No, it doesn't exist on any node.
>>>
>>>> Did the osd crash you mentioned happen at the end (when you started
>>>> seeing
>>>> these 6 pgs misbehave), or did it recover fully after that, and only do
>>>> this after a later OSD was reformatted?
>>>
>>> The crash happened during the rebuild. The rebuild finished with these
>>> 6 PGs in state "crashed+peering". Everything else was fine (no
>>> degraded objects).
>>>
>>> My suspicion is, that these PGs had been skipped in the rebuild from
>>> osd013 to osd003, because of the crash. After that I did a reformat on
>>> osd013, which might explain, why these PGs are missing on osd013, too.
>>>
>>> Is there a way to create a PG manually?
>>>
>>> Thanks,
>>> Christian
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html