Re: pgs stuck inactive

Samuel Just <sam.just@xxxxxxxxxxxxx> · Fri, 6 Apr 2012 11:20:35 -0700



Hmm, can you post the monitor logs as well for that period?  It looks
like the osd is requesting a map change and not getting it.
-Sam

On Fri, Apr 6, 2012 at 10:59 AM, Damien Churchill <damoxc@xxxxxxxxx> wrote:
> Oops, was set to 600, chmod'd to 644, should be good now.
>
> On 6 April 2012 18:56, Samuel Just <sam.just@xxxxxxxxxxxxx> wrote:
>> Error 403.
>> -Sam
>>
>> On Fri, Apr 6, 2012 at 10:53 AM, Damien Churchill <damoxc@xxxxxxxxx> wrote:
>>> Okay, uploaded it to http://damoxc.net/ceph/osd.0.log.gz. I restarted
>>> the osd with debug osd = 20 and let it run for 5 minutes or so.
>>>
>>> On 6 April 2012 18:30, Samuel Just <sam.just@xxxxxxxxxxxxx> wrote:
>>>> Hmm, osd.0 should be enough.  I need osd debugging at around 20 from
>>>> when the osd started.  Restarting the osd with debugging at 20 would
>>>> also work fine.
>>>> -Sam
>>>>
>>>> On Fri, Apr 6, 2012 at 9:55 AM, Damien Churchill <damoxc@xxxxxxxxx> wrote:
>>>>> I've got that directory on 3 of the osds: 0, 3 and 4. Do you want the
>>>>> logs to all 3 of them?
>>>>>
>>>>> On 6 April 2012 17:50, Samuel Just <sam.just@xxxxxxxxxxxxx> wrote:
>>>>>> Is there a 0.138_head directory under current/ on any of your osds?
>>>>>> If so, can you post the log to that osd?  I could also use the osd.0
>>>>>> log.
>>>>>> -Sam
>>>>>>
>>>>>> On Wed, Apr 4, 2012 at 2:44 PM, Damien Churchill <damoxc@xxxxxxxxx> wrote:
>>>>>>> I've uploaded them to:
>>>>>>>
>>>>>>> http://damoxc.net/ceph/osdmap
>>>>>>> http://damoxc.net/ceph/pg_dump
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On 4 April 2012 21:51, Samuel Just <sam.just@xxxxxxxxxxxxx> wrote:
>>>>>>>> Can you post a copy of your osd map and the output of 'ceph pg dump' ?
>>>>>>>>  You can get the osdmap via 'ceph osd getmap -o <filename>'.
>>>>>>>> -Sam
>>>>>>>>
>>>>>>>> On Wed, Apr 4, 2012 at 1:12 AM, Damien Churchill <damoxc@xxxxxxxxx> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm having some trouble getting some pgs to stop being inactive. The
>>>>>>>>> cluster is running 0.44.1 and the kernel version is 3.2.x.
>>>>>>>>>
>>>>>>>>> ceph -s reports:
>>>>>>>>> 2012-04-04 09:08:57.816029    pg v188540: 990 pgs: 223 inactive, 767
>>>>>>>>> active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail
>>>>>>>>> 2012-04-04 09:08:57.817970   mds e2198: 1/1/1 up {0=node24=up:active},
>>>>>>>>> 4 up:standby
>>>>>>>>> 2012-04-04 09:08:57.818024   osd e5910: 5 osds: 5 up, 5 in
>>>>>>>>> 2012-04-04 09:08:57.818201   log 2012-04-04 09:04:03.838358 osd.3
>>>>>>>>> 172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok
>>>>>>>>> 2012-04-04 09:08:57.818280   mon e7: 3 mons at
>>>>>>>>> {node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0}
>>>>>>>>>
>>>>>>>>> ceph health says:
>>>>>>>>> 2012-04-04 09:09:01.651053 mon <- [health]
>>>>>>>>> 2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck
>>>>>>>>> inactive; 223 pgs stuck unclean' (0)
>>>>>>>>>
>>>>>>>>> I was wondering if anyone has any suggestions about how to resolve
>>>>>>>>> this, or things to look for. I've tried restarted the ceph daemons on
>>>>>>>>> the various nodes a few times to no-avail. I don't think that there is
>>>>>>>>> anything wrong with any of the nodes either.
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>> Damien
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html