Thanks, Wido for the update.
Yeah, I have already tried a restart of ceph-mgr. But it didn't help.
On Wed, Oct 30, 2019 at 4:30 PM Wido den Hollander <wido@xxxxxxxx> wrote:
On 10/30/19 3:04 AM, soumya tr wrote:
> Hi all,
>
> I have a 3 node ceph cluster setup using juju charms. ceph health shows
> having inactive pgs.
>
> ---------------
> /# ceph status
> cluster:
> id: 0e36956e-ef64-11e9-b472-00163e6e01e8
> health: HEALTH_WARN
> Reduced data availability: 114 pgs inactive
>
> services:
> mon: 3 daemons, quorum
> juju-06c3e9-0-lxd-0,juju-06c3e9-2-lxd-0,juju-06c3e9-1-lxd-0
> mgr: juju-06c3e9-0-lxd-0(active), standbys: juju-06c3e9-1-lxd-0,
> juju-06c3e9-2-lxd-0
> osd: 3 osds: 3 up, 3 in
>
> data:
> pools: 18 pools, 114 pgs
> objects: 0 objects, 0 B
> usage: 3.0 GiB used, 34 TiB / 34 TiB avail
> pgs: 100.000% pgs unknown
> 114 unknown/
> ---------------
>
> *PG health as well shows the PGs are in inactive state*
>
> -------------------------------
> /# ceph health detail
> HEALTH_WARN Reduced data availability: 114 pgs inactive
> PG_AVAILABILITY Reduced data availability: 114 pgs inactive
> pg 1.0 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 1.1 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 1.2 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 1.3 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 1.4 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 1.5 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 1.6 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 1.7 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 1.8 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 1.9 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 1.a is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 2.0 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 2.1 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 3.0 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 3.1 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 4.0 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 4.1 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 5.0 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 5.1 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 6.0 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 6.1 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 7.0 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 7.1 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 8.0 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 8.1 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 9.0 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 9.1 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 10.1 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 11.0 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 17.10 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 17.11 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 17.12 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 17.13 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 17.14 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 17.15 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 17.16 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 17.17 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 17.18 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 17.19 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 17.1a is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 18.10 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 18.11 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 18.12 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 18.13 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 18.14 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 18.15 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 18.16 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 18.17 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 18.19 is stuck inactive for 1454.593774, current state unknown,
> last acting []
> pg 18.1a is stuck inactive for 1454.593774, current state unknown,
> last acting []
> /
> / pg 18.1b is stuck inactive for 1454.593774, current state unknown,
> last acting []/
> --------------------------------
>
> But the weird thing is when I query for individual pg, its unable to
> find it :(
>
> --------------------------------
> /# ceph pg 1.1 query
> Error ENOENT: i don't have pgid 1.1
> /
> /
> /
> /# ceph pg 18.1a query
> Error ENOENT: i don't have pgid 18.1a
> /
> /
> /
> /# ceph pg 18.1b query
> Error ENOENT: i don't have pgid 18.1b/
> --------------------------------
>
> As per https://docs.ceph.com/docs/master/rados/operations/pg-states/,
>
> ---------------------------------
> /unknown : /The ceph-mgr hasn’t yet received any information about the
> PG’s state from an OSD since mgr started up.
> ---------------------------------
>
> I confirmed that all ceph osds are up, and the ceph-mgr service is as
> well running.
>
Did you restart the Mgr? And are there maybe firewalls in between which
might be causing troubles?
This seems like a Mgr issue.
Wido
> Is there anything else that I need to check to rectify the issue?
>
>
> --
> Regards,
> Soumya
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
Regards,
Soumya
Soumya
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com