Re: Ceph pg in inactive state

soumya tr <soumya.324@xxxxxxxxx> · Thu, 31 Oct 2019 11:26:43 +1100

Thanks, Wido for the update.
Yeah, I have already tried a restart of ceph-mgr.  But it didn't help.

On Wed, Oct 30, 2019 at 4:30 PM Wido den Hollander <wido@xxxxxxxx> wrote:

On 10/30/19 3:04 AM, soumya tr wrote:

> Hi all,

> 

> I have a 3 node ceph cluster setup using juju charms. ceph health shows

> having inactive pgs.

> 

> ---------------

> /# ceph status

>   cluster:

>     id:     0e36956e-ef64-11e9-b472-00163e6e01e8

>     health: HEALTH_WARN

>             Reduced data availability: 114 pgs inactive

> 

>   services:

>     mon: 3 daemons, quorum

> juju-06c3e9-0-lxd-0,juju-06c3e9-2-lxd-0,juju-06c3e9-1-lxd-0

>     mgr: juju-06c3e9-0-lxd-0(active), standbys: juju-06c3e9-1-lxd-0,

> juju-06c3e9-2-lxd-0

>     osd: 3 osds: 3 up, 3 in

> 

>   data:

>     pools:   18 pools, 114 pgs

>     objects: 0  objects, 0 B

>     usage:   3.0 GiB used, 34 TiB / 34 TiB avail

>     pgs:     100.000% pgs unknown

>              114 unknown/

> ---------------

> 

> *PG health as well shows the PGs are in inactive state*

> 

> -------------------------------

> /# ceph health detail

> HEALTH_WARN Reduced data availability: 114 pgs inactive

> PG_AVAILABILITY Reduced data availability: 114 pgs inactive

>     pg 1.0 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 1.1 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 1.2 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 1.3 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 1.4 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 1.5 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 1.6 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 1.7 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 1.8 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 1.9 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 1.a is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 2.0 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 2.1 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 3.0 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 3.1 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 4.0 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 4.1 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 5.0 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 5.1 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 6.0 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 6.1 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 7.0 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 7.1 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 8.0 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 8.1 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 9.0 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 9.1 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 10.1 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 11.0 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 17.10 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 17.11 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 17.12 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 17.13 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 17.14 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 17.15 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 17.16 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 17.17 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 17.18 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 17.19 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 17.1a is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 18.10 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 18.11 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 18.12 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 18.13 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 18.14 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 18.15 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 18.16 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 18.17 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 18.19 is stuck inactive for 1454.593774, current state unknown,

> last acting []

>     pg 18.1a is stuck inactive for 1454.593774, current state unknown,

> last acting []

> /

> /    pg 18.1b is stuck inactive for 1454.593774, current state unknown,

> last acting []/

> --------------------------------

> 

> But the weird thing is when I query for individual pg, its unable to

> find it :(

> 

> --------------------------------

> /# ceph pg 1.1 query

> Error ENOENT: i don't have pgid 1.1

> /

> /

> /

> /# ceph pg 18.1a query

> Error ENOENT: i don't have pgid 18.1a

> /

> /

> /

> /# ceph pg 18.1b query

> Error ENOENT: i don't have pgid 18.1b/

> --------------------------------

> 

> As per https://docs.ceph.com/docs/master/rados/operations/pg-states/, 

> 

> ---------------------------------

> /unknown : /The ceph-mgr hasn’t yet received any information about the

> PG’s state from an OSD since mgr started up.

> ---------------------------------

> 

> I confirmed that all ceph osds are up, and the ceph-mgr service is as

> well running. 

> 

Did you restart the Mgr? And are there maybe firewalls in between which

might be causing troubles?

This seems like a Mgr issue.

Wido

> Is there anything else that I need to check to rectify the issue?

> 

> 

> -- 

> Regards,

> Soumya

> 

> 

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> 

-- 
Regards, 
Soumya

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com