Placement groups forever in "creating" state and dont map to OSD

ksharma@xxxxxxxx (Kapil Sharma) · Tue, 05 Aug 2014 10:21:28 +0200

Ideally you should see all your OSD hosts in the #buckets section and
those hosts should contain their respective OSDs. But you mentioned you
are using an old release of Ceph, 0.56 is it ? I am not sure if the OSD
crush map was different in that release.

Regards,
Kapil.

On Mon, 2014-08-04 at 10:37 +0000, Yogesh_Devi at Dell.com wrote:
> Hi Kapil,
> 
> The crush map is below 
> 
> 
> # begin crush map
> 
> # devices
> 
> device 0 osd.0
> 
> device 1 osd.1
> 
> # types
> 
> type 0 osd
> 
> type 1 host
> 
> type 2 rack
> 
> type 3 row
> 
> type 4 room
> 
> type 5 datacenter
> 
> type 6 root
> 
>  
> 
> # buckets
> 
> root default {
> 
>         id -1           # do not change unnecessarily
> 
>         # weight 1.000
> 
>         alg straw
> 
>         hash 0  # rjenkins1
> 
>         item osd.0 weight 0.500
> 
>         item osd.1 weight 0.500
> 
> }
> 
>  
> 
> # rules
> 
> rule data {
> 
>         ruleset 0
> 
>         type replicated
> 
>         min_size 1
> 
>         max_size 10
> 
>         step take default
> 
>         step chooseleaf firstn 0 type host
> 
>         step emit
> 
> }
> 
> rule metadata {
> 
>         ruleset 1
> 
>         type replicated
> 
>         min_size 1
> 
>         max_size 10
> 
>         step take default
> 
>         step chooseleaf firstn 0 type host
> 
>         step emit
> 
> }
> 
> rule rbd {
> 
>         ruleset 2 type replicated
> 
>         min_size 1
> 
>         max_size 10
> 
>         step take default
> 
>         step chooseleaf firstn 0 type host
> 
>         step emit
> 
> }
> 
> # end crush map
> 
> 
>  
> 
> Yogesh Devi, 
> Architect,  Dell Cloud Clinical Archive Dell 
> 
>  
> 
> Land Phone     +91 80 28413000 Extension ? 2781 
> Hand Phone    +91 99014 71082 
> 
>  
> 
> -----Original Message----- 
> From: Kapil Sharma [mailto:ksharma at suse.com]
> Sent: Monday, August 04, 2014 3:59 PM 
> To: Devi, Yogesh 
> Cc: matt at cactuar.net; ceph-users at lists.ceph.com; Pulicken, Antony 
> Subject: Re: [ceph-users] Placement groups forever in "creating" state
> and dont map to OSD 
> 
> I think ceph osd tree should list your OSDs under the node bucket. 
> Could you check your osd crush map also with this -
> 
> ceph osd getcrushmap -o filename 
> crushtool -d filename -o filename.txt 
> 
> you should see your OSDs in the #devices section and you should see
> your three servers in the #buckets section. 
> 
>  
> 
> Regards, 
> Kapil. 
> 
> 
> 
> On Mon, 2014-08-04 at 10:09 +0000, Yogesh_Devi at Dell.com wrote: 
> > Dell - Internal Use - Confidential 
> > 
> > Hi Kapil 
> > 
> > Thanks for responding J 
> > 
> > My Mon-server two OSD?s are running on three separate servers one
> for
> > respective node. All are SLES sp3. 
> > 
> >  
> > 
> > Below is ?ceph osd tree? output from my mon server box 
> > 
> >  
> > 
> > slesceph1: # ceph osd tree 
> > 
> > # id    weight  type name       up/down reweight 
> > 
> > -1      1       root default 
> > 
> > 0       0.5             osd.0   up      1 
> > 
> > 1       0.5             osd.1   up      1 
> > 
> > Yogesh 
> > 
> >  
> > 
> > Land Phone     +91 80 28413000 Extension ? 2781 
> > Hand Phone    +91 99014 71082 
> > 
> >  
> > 
> > -----Original Message----- 
> > From: Kapil Sharma [mailto:ksharma at suse.com] 
> > Sent: Monday, August 04, 2014 3:31 PM 
> > To: Devi, Yogesh 
> > Cc: matt at cactuar.net; ceph-users at lists.ceph.com; Pulicken, Antony 
> > Subject: Re: [ceph-users] Placement groups forever in "creating"
> state
> > and dont map to OSD 
> > 
> > Hi Yogesh, 
> > 
> > Are your two OSDs on same node ? Could you check the osd tree output
> > with the command - "ceph osd tree" 
> > 
> >  
> > 
> > Regards, 
> > Kapil. 
> > 
> > 
> > 
> > On Mon, 2014-08-04 at 09:22 +0000, Yogesh_Devi at Dell.com wrote:
> > > Dell - Internal Use - Confidential 
> > > 
> > > Matt 
> > > 
> > > I am using Suse Enterprise Linux 11 ? SP3 ( SLES SP3) 
> > > 
> > >  
> > > 
> > > I don?t think I have enabled SE Linux .. 
> > > 
> > >  
> > > 
> > > Yogesh Devi, 
> > > 
> > > Architect,  Dell Cloud Clinical Archive 
> > > 
> > > Dell 
> > > 
> > >  
> > > 
> > >  
> > > 
> > > Land Phone     +91 80 28413000 Extension ? 2781
> > > 
> > > Hand Phone    +91 99014 71082
> > > 
> > > 
> > >  
> > > 
> > > From: Matt Harlum [mailto:matt at cactuar.net] 
> > > Sent: Monday, August 04, 2014 1:43 PM 
> > > To: Devi, Yogesh 
> > > Cc: ceph-users at lists.ceph.com; Pulicken, Antony 
> > > Subject: Re: [ceph-users] Placement groups forever in "creating" 
> > state 
> > > and dont map to OSD 
> > > 
> > > 
> > >  
> > > 
> > > Hi 
> > > 
> > >  
> > > 
> > > 
> > > What distributions are your machines using? and is SELinux
> enabled 
> > on 
> > > them? 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > I ran into the same issue once, i had to disable SELinux on all
> the
> > > machines and then reinstall 
> > > 
> > > 
> > >  
> > > 
> > > 
> > >  
> > > 
> > > On 4 Aug 2014, at 5:25 pm, Yogesh_Devi at Dell.com wrote:
> > > 
> > > 
> > > 
> > > 
> > > Dell - Internal Use - Confidential 
> > > 
> > > Matt 
> > > 
> > > 
> > > Thanks for responding 
> > > 
> > > 
> > > As suggested I tried to set replication to 2X by usng commands you
> > > provided 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > $ceph osd pool set data size 2 
> > > 
> > > 
> > > $ceph osd pool set data min_size 2 
> > > 
> > > 
> > > $ceph osd pool set rbd size 2 
> > > 
> > > 
> > > $ceph osd pool set rbd min_size 2 
> > > 
> > > 
> > > $ceph osd pool set metadata size 2 
> > > 
> > > 
> > > $ceph osd pool set metadata min_size 2 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > It told me ? 
> > > 
> > > 
> > > set pool 0 size to 2 
> > > 
> > > 
> > > set pool 0 min_size to 2 
> > > 
> > > 
> > > set pool 2 size to 2 
> > > 
> > > 
> > > set pool 2 min_size to 2 
> > > 
> > > 
> > > set pool 1 size to 2 
> > > 
> > > 
> > > set pool 1 min_size to 2 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > To verify that pool size had indeed changed ? I checked again 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > $ceph osd dump | grep 'rep size'
> > > 
> > > 
> > > pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins 
> > pg_num 
> > > 64 pgp_num 64 last_change 90 owner 0 crash_replay_interval 45 
> > > 
> > > 
> > > pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins
> > > pg_num 64 pgp_num 64 last_change 94 owner 0 
> > > 
> > > 
> > > pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins
> pg_num 
> > 64 
> > > pgp_num 64 last_change 92 owner 0 
> > > 
> > > 
> > > pool 3 'datapool' rep size 2 crush_ruleset 2 object_hash rjenkins
> > > pg_num 10 pgp_num 10 last_change 38 owner 0 
> > > 
> > > 
> > >  
> > > 
> > > 
> > >  
> > > 
> > > 
> > > However ? my cluster is still in same state 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > $ceph -s 
> > > 
> > > 
> > >    health HEALTH_WARN 202 pgs stuck inactive; 202 pgs stuck
> unclean 
> > > 
> > > 
> > >    monmap e1: 1 mons at {slesceph1=160.110.73.200:6789/0},
> election
> > > epoch 1, quorum 0 slesceph1 
> > > 
> > > 
> > >    osdmap e106: 2 osds: 2 up, 2 in 
> > > 
> > > 
> > >     pgmap v171: 202 pgs: 202 creating; 0 bytes data, 10306 MB
> used, 
> > > 71573 MB / 81880 MB avail 
> > > 
> > > 
> > >    mdsmap e1: 0/0/1 up 
> > > 
> > > 
> > > Yogesh Devi, 
> > > 
> > > 
> > > Architect,  Dell Cloud Clinical Archive 
> > > 
> > > 
> > > Dell 
> > > 
> > > 
> > >  
> > > 
> > > 
> > >  
> > > 
> > > 
> > > Land Phone     +91 80 28413000 Extension ? 2781
> > > 
> > > 
> > > Hand Phone    +91 99014 71082
> > > 
> > > 
> > >  
> > > 
> > > 
> > > From: Matt Harlum [mailto:matt at cactuar.net] 
> > > Sent: Saturday, August 02, 2014 6:01 AM 
> > > To: Devi, Yogesh 
> > > Cc: Pulicken, Antony 
> > > Subject: Re: [ceph-users] Placement groups forever in "creating" 
> > state 
> > > and dont map to OSD 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > Hi Yogesh, 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > By default ceph is configured to create 3 replicas of the data,
> with
> > > only 3 OSDs it cannot create all of the pgs required to do this 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > You will need to change the replication to 2x for your pools,
> this 
> > can 
> > > be done like so: 
> > > 
> > > 
> > > ceph odd pool set data size 2 
> > > 
> > > 
> > > ceph odd pool set data min_size 2 
> > > 
> > > 
> > > ceph odd pool set rbd size 2 
> > > 
> > > 
> > > ceph odd pool set rbd min_size 2 
> > > 
> > > 
> > > ceph odd pool set metadata size 2 
> > > 
> > > 
> > > ceph odd pool set metadata min_size 2 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > Once you do this your ceph cluster should go to a healthy state.
> > > 
> > > 
> > >  
> > > 
> > > 
> > > Regards, 
> > > 
> > > 
> > > Matt 
> > > 
> > > 
> > >  
> > > 
> > > 
> > >  
> > > 
> > > 
> > >  
> > > 
> > > 
> > > On 2 Aug 2014, at 12:57 am, Yogesh_Devi at Dell.com wrote:
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Dell - Internal Use - Confidential 
> > > 
> > > Hello Ceph Experts J , 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > I am using ceph ( ceph version 0.56.6) on Suse linux.
> > > 
> > > 
> > > I created a simple cluster with one monitor server and two OSDs .
> > > 
> > > 
> > > The conf file is attached 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > When  start my cluster ? and do ?ceph ?s? -  I see following 
> > message 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > $ceph ?s? 
> > > 
> > > 
> > > health HEALTH_WARN 202 pgs stuck inactive; 202 pgs stuck unclean 
> > > 
> > > 
> > >    monmap e1: 1 mons at {slesceph1=160.110.73.200:6789/0},
> election
> > > epoch 1, quorum 0 slesceph1 
> > > 
> > > 
> > >    osdmap e56: 2 osds: 2 up, 2 in 
> > > 
> > > 
> > >     pgmap v100: 202 pgs: 202 creating; 0 bytes data, 10305 MB
> used, 
> > > 71574 MB / 81880 MB avail 
> > > 
> > > 
> > >    mdsmap e1: 0/0/1 up 
> > > 
> > > 
> > >  
> > > 
> > > 
> > >  
> > > 
> > > 
> > > Basically there is some problem with my placement groups ? they
> are
> > > forever stuck in ?creating? state and there is no OSD associated 
> > with 
> > > them ( despite having two OSD?s that are up and in? ) ? when I do
> a
> > > ceph pg stat? I see as follows 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > $ceph pg stat 
> > > 
> > > 
> > > v100: 202 pgs: 202 creating; 0 bytes data, 10305 MB used, 71574 
> > MB / 
> > > 81880 MB avail 
> > > 
> > > 
> > >  
> > > 
> > > 
> > >  
> > > 
> > > 
> > > if I query any individual pg ? then I see it isn?t mapped to any 
> > OSD 
> > > 
> > > 
> > > $ ceph pg 0.d query 
> > > 
> > > 
> > > pgid currently maps to no osd 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > I tried restaring OSDs and tuning my configuration without any 
> > avail 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > Any suggestions ? 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > Yogesh Devi 
> > > 
> > > 
> > > <ceph.conf>_______________________________________________ 
> > > ceph-users mailing list 
> > > ceph-users at lists.ceph.com 
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > > 
> > > 
> > >  
> > > 
> > > 
> > > _______________________________________________ 
> > > ceph-users mailing list 
> > > ceph-users at lists.ceph.com 
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > 
> >  
> > 
> > 
> 
>  
> 
>