Re: Brand new cluster -- pg is stuck inactive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/14/2017 12:53 AM, David Turner wrote:
What does your environment look like?  Someone recently on the mailing list had PGs stuck creating because of a networking issue.

On Fri, Oct 13, 2017 at 2:03 PM Ronny Aasen <ronny+ceph-users@xxxxxxxx> wrote:
strange that no osd is acting for your pg's
can you show the output from
ceph osd tree


mvh
Ronny Aasen



On 13.10.2017 18:53, dE wrote:
> Hi,
>
>     I'm running ceph 10.2.5 on Debian (official package).
>
> It cant seem to create any functional pools --
>
> ceph health detail
> HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64 pgs
> stuck inactive; too few PGs per OSD (21 < min 30)
> pg 0.39 is stuck inactive for 652.741684, current state creating, last
> acting []
> pg 0.38 is stuck inactive for 652.741688, current state creating, last
> acting []
> pg 0.37 is stuck inactive for 652.741690, current state creating, last
> acting []
> pg 0.36 is stuck inactive for 652.741692, current state creating, last
> acting []
> pg 0.35 is stuck inactive for 652.741694, current state creating, last
> acting []
> pg 0.34 is stuck inactive for 652.741696, current state creating, last
> acting []
> pg 0.33 is stuck inactive for 652.741698, current state creating, last
> acting []
> pg 0.32 is stuck inactive for 652.741701, current state creating, last
> acting []
> pg 0.3 is stuck inactive for 652.741762, current state creating, last
> acting []
> pg 0.2e is stuck inactive for 652.741715, current state creating, last
> acting []
> pg 0.2d is stuck inactive for 652.741719, current state creating, last
> acting []
> pg 0.2c is stuck inactive for 652.741721, current state creating, last
> acting []
> pg 0.2b is stuck inactive for 652.741723, current state creating, last
> acting []
> pg 0.2a is stuck inactive for 652.741725, current state creating, last
> acting []
> pg 0.29 is stuck inactive for 652.741727, current state creating, last
> acting []
> pg 0.28 is stuck inactive for 652.741730, current state creating, last
> acting []
> pg 0.27 is stuck inactive for 652.741732, current state creating, last
> acting []
> pg 0.26 is stuck inactive for 652.741734, current state creating, last
> acting []
> pg 0.3e is stuck inactive for 652.741707, current state creating, last
> acting []
> pg 0.f is stuck inactive for 652.741761, current state creating, last
> acting []
> pg 0.3f is stuck inactive for 652.741708, current state creating, last
> acting []
> pg 0.10 is stuck inactive for 652.741763, current state creating, last
> acting []
> pg 0.4 is stuck inactive for 652.741773, current state creating, last
> acting []
> pg 0.5 is stuck inactive for 652.741774, current state creating, last
> acting []
> pg 0.3a is stuck inactive for 652.741717, current state creating, last
> acting []
> pg 0.b is stuck inactive for 652.741771, current state creating, last
> acting []
> pg 0.c is stuck inactive for 652.741772, current state creating, last
> acting []
> pg 0.3b is stuck inactive for 652.741721, current state creating, last
> acting []
> pg 0.d is stuck inactive for 652.741774, current state creating, last
> acting []
> pg 0.3c is stuck inactive for 652.741722, current state creating, last
> acting []
> pg 0.e is stuck inactive for 652.741776, current state creating, last
> acting []
> pg 0.3d is stuck inactive for 652.741724, current state creating, last
> acting []
> pg 0.22 is stuck inactive for 652.741756, current state creating, last
> acting []
> pg 0.21 is stuck inactive for 652.741758, current state creating, last
> acting []
> pg 0.a is stuck inactive for 652.741783, current state creating, last
> acting []
> pg 0.20 is stuck inactive for 652.741761, current state creating, last
> acting []
> pg 0.9 is stuck inactive for 652.741787, current state creating, last
> acting []
> pg 0.1f is stuck inactive for 652.741764, current state creating, last
> acting []
> pg 0.8 is stuck inactive for 652.741790, current state creating, last
> acting []
> pg 0.7 is stuck inactive for 652.741792, current state creating, last
> acting []
> pg 0.6 is stuck inactive for 652.741794, current state creating, last
> acting []
> pg 0.1e is stuck inactive for 652.741770, current state creating, last
> acting []
> pg 0.1d is stuck inactive for 652.741772, current state creating, last
> acting []
> pg 0.1c is stuck inactive for 652.741774, current state creating, last
> acting []
> pg 0.1b is stuck inactive for 652.741777, current state creating, last
> acting []
> pg 0.1a is stuck inactive for 652.741784, current state creating, last
> acting []
> pg 0.2 is stuck inactive for 652.741812, current state creating, last
> acting []
> pg 0.31 is stuck inactive for 652.741762, current state creating, last
> acting []
> pg 0.19 is stuck inactive for 652.741789, current state creating, last
> acting []
> pg 0.11 is stuck inactive for 652.741797, current state creating, last
> acting []
> pg 0.18 is stuck inactive for 652.741793, current state creating, last
> acting []
> pg 0.1 is stuck inactive for 652.741820, current state creating, last
> acting []
> pg 0.30 is stuck inactive for 652.741769, current state creating, last
> acting []
> pg 0.17 is stuck inactive for 652.741797, current state creating, last
> acting []
> pg 0.0 is stuck inactive for 652.741829, current state creating, last
> acting []
> pg 0.2f is stuck inactive for 652.741774, current state creating, last
> acting []
> pg 0.16 is stuck inactive for 652.741802, current state creating, last
> acting []
> pg 0.12 is stuck inactive for 652.741807, current state creating, last
> acting []
> pg 0.13 is stuck inactive for 652.741807, current state creating, last
> acting []
> pg 0.14 is stuck inactive for 652.741807, current state creating, last
> acting []
> pg 0.15 is stuck inactive for 652.741808, current state creating, last
> acting []
> pg 0.23 is stuck inactive for 652.741792, current state creating, last
> acting []
> pg 0.24 is stuck inactive for 652.741793, current state creating, last
> acting []
> pg 0.25 is stuck inactive for 652.741793, current state creating, last
> acting []
>
> I got 3 OSDs --
>
> ceph osd stat
>      osdmap e8: 3 osds: 3 up, 3 in
>             flags sortbitwise,require_jewel_osds
>
> ceph osd pool ls detail
> pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool
> stripe_width 0
>
> The state inactive seems to be odd for a brand new pool with no data.
>
> This's my ceph.conf --
>
> [global]
> fsid = 8161c91e-dbd2-4491-adf8-74446bef916a
> auth cluster required = cephx
> auth service required = cephx
> auth client required = cephx
> debug = 10/10
> mon host = 10.242.103.139:8567,10.242.103.140:8567,10.242.103.141:8567
> [mon]
> ms bind ipv6 = false
> mon data = ""> > mon addr = 0.0.0.0:8567
> mon warn on legacy crush tunables = true
> mon crush min required version = jewel
> mon initial members = 0,1,2
> keyring = /etc/ceph/mon_keyring
> log file = /var/log/ceph/mon.log
> [osd]
> osd data = ""> > osd journal = /srv/ceph/osd/osd_journal
> osd journal size = 10240
> osd recovery delay start = 10
> osd recovery thread timeout = 60
> osd recovery max active = 1
> osd recovery max chunk = 10485760
> osd max backfills = 2
> osd backfill retry interval = 60
> osd backfill scan min = 100
> osd backfill scan max = 1000
> keyring = /etc/ceph/osd_keyring
>
> The monitors run on the same host as osds.
>
> Any help will be appreciated highly!
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

These are VMs with a Linux bridge for connectivity.

vlan haver been created over teamed interfaces for the primary interface.

The osds can be seen as up and in and there's a quorum, so not a connectivity issue.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux