Re: pgs stuck inactive

Samuel Just <sam.just@xxxxxxxxxxxxx> · Fri, 6 Apr 2012 10:30:52 -0700



Hmm, osd.0 should be enough.  I need osd debugging at around 20 from
when the osd started.  Restarting the osd with debugging at 20 would
also work fine.
-Sam

On Fri, Apr 6, 2012 at 9:55 AM, Damien Churchill <damoxc@xxxxxxxxx> wrote:
> I've got that directory on 3 of the osds: 0, 3 and 4. Do you want the
> logs to all 3 of them?
>
> On 6 April 2012 17:50, Samuel Just <sam.just@xxxxxxxxxxxxx> wrote:
>> Is there a 0.138_head directory under current/ on any of your osds?
>> If so, can you post the log to that osd?  I could also use the osd.0
>> log.
>> -Sam
>>
>> On Wed, Apr 4, 2012 at 2:44 PM, Damien Churchill <damoxc@xxxxxxxxx> wrote:
>>> I've uploaded them to:
>>>
>>> http://damoxc.net/ceph/osdmap
>>> http://damoxc.net/ceph/pg_dump
>>>
>>> Thanks
>>>
>>> On 4 April 2012 21:51, Samuel Just <sam.just@xxxxxxxxxxxxx> wrote:
>>>> Can you post a copy of your osd map and the output of 'ceph pg dump' ?
>>>>  You can get the osdmap via 'ceph osd getmap -o <filename>'.
>>>> -Sam
>>>>
>>>> On Wed, Apr 4, 2012 at 1:12 AM, Damien Churchill <damoxc@xxxxxxxxx> wrote:
>>>>> Hi,
>>>>>
>>>>> I'm having some trouble getting some pgs to stop being inactive. The
>>>>> cluster is running 0.44.1 and the kernel version is 3.2.x.
>>>>>
>>>>> ceph -s reports:
>>>>> 2012-04-04 09:08:57.816029    pg v188540: 990 pgs: 223 inactive, 767
>>>>> active+clean; 205 GB data, 1013 GB used, 8204 GB / 9315 GB avail
>>>>> 2012-04-04 09:08:57.817970   mds e2198: 1/1/1 up {0=node24=up:active},
>>>>> 4 up:standby
>>>>> 2012-04-04 09:08:57.818024   osd e5910: 5 osds: 5 up, 5 in
>>>>> 2012-04-04 09:08:57.818201   log 2012-04-04 09:04:03.838358 osd.3
>>>>> 172.22.10.24:6801/30000 159 : [INF] 0.13d scrub ok
>>>>> 2012-04-04 09:08:57.818280   mon e7: 3 mons at
>>>>> {node21=172.22.10.21:6789/0,node22=172.22.10.22:6789/0,node23=172.22.10.23:6789/0}
>>>>>
>>>>> ceph health says:
>>>>> 2012-04-04 09:09:01.651053 mon <- [health]
>>>>> 2012-04-04 09:09:01.666585 mon.1 -> 'HEALTH_WARN 223 pgs stuck
>>>>> inactive; 223 pgs stuck unclean' (0)
>>>>>
>>>>> I was wondering if anyone has any suggestions about how to resolve
>>>>> this, or things to look for. I've tried restarted the ceph daemons on
>>>>> the various nodes a few times to no-avail. I don't think that there is
>>>>> anything wrong with any of the nodes either.
>>>>>
>>>>> Thanks in advance,
>>>>> Damien
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html