Re: pgs stuck inactive and unclean, too feww PGs per OSD

Chris Jones <cjones@xxxxxxxxxxx> · Thu, 8 Oct 2015 00:16:37 -0400

One possibility, it may be that the crush map is not creating. Look at your /etc/ceph/ceph.conf file and see if you have something under the OSD section (actually could be in global too) that looks like the following:

osd crush update on start = false

If that line is there and if you're not modifying the crush map manually or via automation then remove it or comment it out. It stops the automatic creation of the crush map. Another thing maybe the following:

osd crush chooseleaf type = 1

The above line I believe is the default but I have seem some have 3 for rack etc and it cause some issues unless you have modified the crush map correctly.

-Chris

On Wed, Oct 7, 2015 at 11:27 PM, wikison <wikison@xxxxxxx> wrote:
Hi,
        I've removed the rbd pool and created it again. It picked up my default settings but there are still some problems.
        After running "sudo ceph -s", the output is as follow:

    cluster 0b9b05db-98fe-49e6-b12b-1cce0645c015
     health HEALTH_WARN
            512 pgs stuck inactive
            512 pgs stuck unclean
     monmap e1: 1 mons at {monitorOne=192.168.1.153:6789/0}
            election epoch 1, quorum 0 monitorOne
     osdmap e62: 8 osds: 8 up, 8 in
      pgmap v219: 512 pgs, 1 pools, 0 bytes data, 0 objects
            8460 MB used, 4162 GB / 4171 GB avail
                 512 creating

Ceph stucks in creating the pgs forever. Those pgs are stuck in inactive and unclean. And the Ceph pg query hangs forever.
I googled this problem and didn't get a clue.
Is there anything I missed? 
Any idea to help me?

--

Zhen Wang

At 2015-10-07 13:05:51, "Christian Balzer" <chibi@xxxxxxx> wrote:
>
>Hello,
>On Wed, 7 Oct 2015 12:57:58 +0800 (CST) wikison wrote:
>
>This is a very old bug, misfeature. 
>And creeps up every week or so here, google is your friend.
>
>> Hi, 
>> I have a cluster of one monitor and eight OSDs. These OSDs are running
>> on four hosts(each host has two OSDs). When I set up everything and
>> started Ceph, I got this: esta@monitorOne:~$ sudo ceph -s [sudo]
>> password for esta: cluster 0b9b05db-98fe-49e6-b12b-1cce0645c015
>>      health HEALTH_WARN
>>             64 pgs stuck inactive
>>             64 pgs stuck unclean
>>             too few PGs per OSD (8 < min 30)
>
>Those 3 lines tell you pretty much all there is wrong.
>You did (correctly) set the defaul pg and pgp nums to something sensible
>(512) in your ceph.conf.
>Unfortunately when creating the initial pool (rbd) it still ignores those
>settings.
>
>You could try to increase those for your pool, which may or may not work.
>
>The easier and faster way is to remove the rbd pool and create it again.
>This should pick up your default settings.
>
>Christian
>
>>      monmap e1: 1 mons at {monitorOne=192.168.1.153:6789/0}
>>             election epoch 1, quorum 0 monitorOne
>>      osdmap e58: 8 osds: 8 up, 8 in
>>       pgmap v191: 64 pgs, 1 pools, 0 bytes data, 0 objects
>>             8460 MB used, 4162 GB / 4171 GB avail
>>                   64 creating
>> 
>> 
>> How to deal with this HEALTH_WARN status?
>> This is my ceph.conf:
>> [global]
>> 
>> 
>>     fsid                        = 0b9b05db-98fe-49e6-b12b-1cce0645c015
>> 
>> 
>>     mon initial members         = monitorOne
>>     mon host                    = 192.168.1.153
>>     filestore_xattr_use_omap    = true
>> 
>> 
>>     public network              = 192.168.1.0/24
>>     cluster network             = 10.0.0.0/24
>>     pid file                    = /var/run/ceph/$name.pid
>> 
>> 
>>     auth cluster required      = cephx
>>     auth service required      = cephx
>>     auth client required       = cephx
>> 
>> 
>>     osd pool default size       = 3
>>     osd pool default min size   = 2
>>     osd pool default pg num     = 512
>>     osd pool default pgp num    = 512
>>     osd crush chooseleaf type   = 1
>>     osd journal size            = 1024
>> 
>> 
>> [mon]
>> 
>> 
>> [mon.0]
>>     host = monitorOne
>>     mon addr = 192.168.1.153:6789
>> 
>> 
>> [osd]
>> 
>> 
>> [osd.0]
>>     host = storageOne
>> 
>> 
>> [osd.1]
>>     host = storageTwo
>> 
>> 
>> [osd.2]
>>     host = storageFour
>> 
>> 
>> [osd.3]
>>     host = storageLast
>>                         
>> 
>> Could anybody help me?
>> 
>> best regards,
>> 
>> --
>> 
>> Zhen Wang
>
>-- 
>Christian Balzer        Network/Systems Engineer                
>chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
>http://www.gol.com/

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Best Regards,Chris Jones

cjones@xxxxxxxxxxx
(p) 770.655.0770

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com