Re: pgs stuck inactive and unclean, too feww PGs per OSD

Christian Balzer <chibi@xxxxxxx> · Thu, 8 Oct 2015 13:27:01 +0900

Hello,

On Thu, 8 Oct 2015 12:21:40 +0800 (CST) wikison wrote:

> Here, like this :
> esta@monitorOne:~$ sudo ceph osd tree
> ID WEIGHT  TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -3 4.39996 root defualt
                 ^^^^^^^^
That's your problem. It should be "default" 

Your manually edited the crush map, right?

> -2 1.09999     host storageTwo
>  0 0.09999         osd.0             up  1.00000          1.00000
>  1 1.00000         osd.1             up  1.00000          1.00000
> -4 1.09999     host storageFour
>  2 0.09999         osd.2             up  1.00000          1.00000
>  3 1.00000         osd.3             up  1.00000          1.00000
> -5 1.09999     host storageLast
>  4 0.09999         osd.4             up  1.00000          1.00000
>  5 1.00000         osd.5             up  1.00000          1.00000
> -6 1.09999     host storageOne
>  6 0.09999         osd.6             up  1.00000          1.00000
>  7 1.00000         osd.7             up  1.00000          1.00000
> -1       0 root default
>
Nothing under the default root, so the default rule to allocate PGs can't
find anything.

Christian

> I have four storage nodes. Each of them has two independent hard drive
> to store data. One is 120GB SSD, and the other is 1TB HDD. I set the
> weight of SSD is 0.1 and weight of HDD is 1.0.
> 
> 
> 
> 
> 
> --
> 
> Zhen Wang
> Shanghai Jiao Tong University
> 
> 
> 
> At 2015-10-08 11:32:52, "Christian Balzer" <chibi@xxxxxxx> wrote:
> >
> >Hello,
> >
> >On Thu, 8 Oct 2015 11:27:46 +0800 (CST) wikison wrote:
> >
> >> Hi,
> >>         I've removed the rbd pool and created it again. It picked up
> >> my default settings but there are still some problems. After running
> >> "sudo ceph -s", the output is as follow: 
> >>     cluster 0b9b05db-98fe-49e6-b12b-1cce0645c015
> >>      health HEALTH_WARN
> >>             512 pgs stuck inactive
> >>             512 pgs stuck unclean
> >>      monmap e1: 1 mons at {monitorOne=192.168.1.153:6789/0}
> >>             election epoch 1, quorum 0 monitorOne
> >>      osdmap e62: 8 osds: 8 up, 8 in
> >>       pgmap v219: 512 pgs, 1 pools, 0 bytes data, 0 objects
> >>             8460 MB used, 4162 GB / 4171 GB avail
> >>                  512 creating
> >> 
> >Output of "ceph osd tree" please.
> >
> >The only reason I can think of is if your OSDs are up, but have no
> >weight.
> >
> >Christian
> >
> >> Ceph stucks in creating the pgs forever. Those pgs are stuck in
> >> inactive and unclean. And the Ceph pg query hangs forever. I googled
> >> this problem and didn't get a clue. Is there anything I missed?
> >> Any idea to help me?
> >> 
> >> 
> >> --
> >> 
> >> Zhen Wang
> >> 
> >> 
> >> 
> >> At 2015-10-07 13:05:51, "Christian Balzer" <chibi@xxxxxxx> wrote:
> >> >
> >> >Hello,
> >> >On Wed, 7 Oct 2015 12:57:58 +0800 (CST) wikison wrote:
> >> >
> >> >This is a very old bug, misfeature. 
> >> >And creeps up every week or so here, google is your friend.
> >> >
> >> >> Hi, 
> >> >> I have a cluster of one monitor and eight OSDs. These OSDs are
> >> >> running on four hosts(each host has two OSDs). When I set up
> >> >> everything and started Ceph, I got this: esta@monitorOne:~$ sudo
> >> >> ceph -s [sudo] password for esta: cluster
> >> >> 0b9b05db-98fe-49e6-b12b-1cce0645c015 health HEALTH_WARN
> >> >>             64 pgs stuck inactive
> >> >>             64 pgs stuck unclean
> >> >>             too few PGs per OSD (8 < min 30)
> >> >
> >> >Those 3 lines tell you pretty much all there is wrong.
> >> >You did (correctly) set the defaul pg and pgp nums to something
> >> >sensible (512) in your ceph.conf.
> >> >Unfortunately when creating the initial pool (rbd) it still ignores
> >> >those settings.
> >> >
> >> >You could try to increase those for your pool, which may or may not
> >> >work.
> >> >
> >> >The easier and faster way is to remove the rbd pool and create it
> >> >again. This should pick up your default settings.
> >> >
> >> >Christian
> >> >
> >> >>      monmap e1: 1 mons at {monitorOne=192.168.1.153:6789/0}
> >> >>             election epoch 1, quorum 0 monitorOne
> >> >>      osdmap e58: 8 osds: 8 up, 8 in
> >> >>       pgmap v191: 64 pgs, 1 pools, 0 bytes data, 0 objects
> >> >>             8460 MB used, 4162 GB / 4171 GB avail
> >> >>                   64 creating
> >> >> 
> >> >> 
> >> >> How to deal with this HEALTH_WARN status?
> >> >> This is my ceph.conf:
> >> >> [global]
> >> >> 
> >> >> 
> >> >>     fsid                        =
> >> >> 0b9b05db-98fe-49e6-b12b-1cce0645c015
> >> >> 
> >> >> 
> >> >>     mon initial members         = monitorOne
> >> >>     mon host                    = 192.168.1.153
> >> >>     filestore_xattr_use_omap    = true
> >> >> 
> >> >> 
> >> >>     public network              = 192.168.1.0/24
> >> >>     cluster network             = 10.0.0.0/24
> >> >>     pid file                    = /var/run/ceph/$name.pid
> >> >> 
> >> >> 
> >> >>     auth cluster required      = cephx
> >> >>     auth service required      = cephx
> >> >>     auth client required       = cephx
> >> >> 
> >> >> 
> >> >>     osd pool default size       = 3
> >> >>     osd pool default min size   = 2
> >> >>     osd pool default pg num     = 512
> >> >>     osd pool default pgp num    = 512
> >> >>     osd crush chooseleaf type   = 1
> >> >>     osd journal size            = 1024
> >> >> 
> >> >> 
> >> >> [mon]
> >> >> 
> >> >> 
> >> >> [mon.0]
> >> >>     host = monitorOne
> >> >>     mon addr = 192.168.1.153:6789
> >> >> 
> >> >> 
> >> >> [osd]
> >> >> 
> >> >> 
> >> >> [osd.0]
> >> >>     host = storageOne
> >> >> 
> >> >> 
> >> >> [osd.1]
> >> >>     host = storageTwo
> >> >> 
> >> >> 
> >> >> [osd.2]
> >> >>     host = storageFour
> >> >> 
> >> >> 
> >> >> [osd.3]
> >> >>     host = storageLast
> >> >>                         
> >> >> 
> >> >> Could anybody help me?
> >> >> 
> >> >> best regards,
> >> >> 
> >> >> --
> >> >> 
> >> >> Zhen Wang
> >> >
> >> >-- 
> >> >Christian Balzer        Network/Systems Engineer                
> >> >chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
> >> >http://www.gol.com/
> >
> >
> >-- 
> >Christian Balzer        Network/Systems Engineer                
> >chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
> >http://www.gol.com/

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com