Re: pgs stuck inactive

Damien Churchill <damoxc@xxxxxxxxx> · Fri, 6 Apr 2012 18:53:08 +0100

Okay, uploaded it to http://damoxc.net/ceph/osd.0.log.gz. I restarted
the osd with debug osd = 20 and let it run for 5 minutes or so.

On 6 April 2012 18:30, Samuel Just <sam.just@xxxxxxxxxxxxx> wrote:
> Hmm, osd.0 should be enough.  I need osd debugging at around 20 from
> when the osd started.  Restarting the osd with debugging at 20 would
> also work fine.
> -Sam
>
> On Fri, Apr 6, 2012 at 9:55 AM, Damien Churchill <damoxc@xxxxxxxxx> wrote:
>> I've got that directory on 3 of the osds: 0, 3 and 4. Do you want the
>> logs to all 3 of them?
>>
>> On 6 April 2012 17:50, Samuel Just <sam.just@xxxxxxxxxxxxx> wrote:
>>> Is there a 0.138_head directory under current/ on any of your osds?
>>> If so, can you post the log to that osd?  I could also use the osd.0
>>> log.
>>> -Sam
>>>
>>> On Wed, Apr 4, 2012 at 2:44 PM, Damien Churchill <damoxc@xxxxxxxxx> wrote:
>>>> I've uploaded them to:
>>>>
>>>> http://damoxc.net/ceph/osdmap
>>>> http://damoxc.net/ceph/pg_dump
>>>>
>>>> Thanks
>>>>
>>>> On 4 April 2012 21:51, Samuel Just <sam.just@xxxxxxxxxxxxx> wrote:
>>>>> Can you post a copy of your osd map and the output of 'ceph pg dump' ?
>>>>>  You can get the osdmap via 'ceph osd getmap -o <filename>'.
>>>>> -Sam
>>>>>
>>>>> On Wed, Apr 4, 2012 at 1:12 AM, Damien Churchill <damoxc@xxxxxxxxx> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm having some trouble getting some pgs to stop being inactive. The
>>>>>> cluster is running 0.44.1 and the kernel version is 3.2.x.
>>>>>>
>>>>>> ceph -s reports:
>>>>>> 2012-04-04 09:08:57.816029    pg v188540: 990 pgs: 223 inactive, 767
>>>>>> active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail
>>>>>> 2012-04-04 09:08:57.817970   mds e2198: 1/1/1 up {0=node24=up:active},
>>>>>> 4 up:standby
>>>>>> 2012-04-04 09:08:57.818024   osd e5910: 5 osds: 5 up, 5 in
>>>>>> 2012-04-04 09:08:57.818201   log 2012-04-04 09:04:03.838358 osd.3
>>>>>> 172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok
>>>>>> 2012-04-04 09:08:57.818280   mon e7: 3 mons at
>>>>>> {node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0}
>>>>>>
>>>>>> ceph health says:
>>>>>> 2012-04-04 09:09:01.651053 mon <- [health]
>>>>>> 2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck
>>>>>> inactive; 223 pgs stuck unclean' (0)
>>>>>>
>>>>>> I was wondering if anyone has any suggestions about how to resolve
>>>>>> this, or things to look for. I've tried restarted the ceph daemons on
>>>>>> the various nodes a few times to no-avail. I don't think that there is
>>>>>> anything wrong with any of the nodes either.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Damien
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html