Re: Brand new cluster -- pg is stuck inactive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/14/2017 08:18 PM, David Turner wrote:

What are the ownership permissions on your osd folders? Clock skew cares about partial seconds.

It isn't the networking issue because your cluster isn't stuck peering. I'm not sure if the creating state happens in disk or in the cluster.


On Sat, Oct 14, 2017, 10:01 AM dE . <de.techno@xxxxxxxxx> wrote:
I attached 1TB disks to each osd.

cluster 8161c90e-dbd2-4491-acf8-74449bef916a
     health HEALTH_ERR
            clock skew detected on mon.1, mon.2

            64 pgs are stuck inactive for more than 300 seconds
            64 pgs stuck inactive
            too few PGs per OSD (21 < min 30)
            Monitor clock skew detected
     monmap e1: 3 mons at {0=10.247.103.139:8567/0,1=10.247.103.140:8567/0,2=10.247.103.141:8567/0}
            election epoch 12, quorum 0,1,2 0,1,2
     osdmap e10: 3 osds: 3 up, 3 in
            flags sortbitwise,require_jewel_osds
      pgmap v38: 64 pgs, 1 pools, 0 bytes data, 0 objects
            33963 MB used, 3037 GB / 3070 GB avail
                  64 creating

I dont seem to have any clock skews --
or i in {139..141}; do ssh $i date +%s; done
1507989554
1507989554
1507989554


On Sat, Oct 14, 2017 at 6:41 PM, David Turner <drakonstein@xxxxxxxxx> wrote:

What is the output of your `ceph status`?


On Fri, Oct 13, 2017, 10:09 PM dE <de.techno@xxxxxxxxx> wrote:
On 10/14/2017 12:53 AM, David Turner wrote:
What does your environment look like?  Someone recently on the mailing list had PGs stuck creating because of a networking issue.

On Fri, Oct 13, 2017 at 2:03 PM Ronny Aasen <ronny+ceph-users@xxxxxxxx> wrote:
strange that no osd is acting for your pg's
can you show the output from
ceph osd tree


mvh
Ronny Aasen



On 13.10.2017 18:53, dE wrote:
> Hi,
>
>     I'm running ceph 10.2.5 on Debian (official package).
>
> It cant seem to create any functional pools --
>
> ceph health detail
> HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64 pgs
> stuck inactive; too few PGs per OSD (21 < min 30)
> pg 0.39 is stuck inactive for 652.741684, current state creating, last
> acting []
> pg 0.38 is stuck inactive for 652.741688, current state creating, last
> acting []
> pg 0.37 is stuck inactive for 652.741690, current state creating, last
> acting []
> pg 0.36 is stuck inactive for 652.741692, current state creating, last
> acting []
> pg 0.35 is stuck inactive for 652.741694, current state creating, last
> acting []
> pg 0.34 is stuck inactive for 652.741696, current state creating, last
> acting []
> pg 0.33 is stuck inactive for 652.741698, current state creating, last
> acting []
> pg 0.32 is stuck inactive for 652.741701, current state creating, last
> acting []
> pg 0.3 is stuck inactive for 652.741762, current state creating, last
> acting []
> pg 0.2e is stuck inactive for 652.741715, current state creating, last
> acting []
> pg 0.2d is stuck inactive for 652.741719, current state creating, last
> acting []
> pg 0.2c is stuck inactive for 652.741721, current state creating, last
> acting []
> pg 0.2b is stuck inactive for 652.741723, current state creating, last
> acting []
> pg 0.2a is stuck inactive for 652.741725, current state creating, last
> acting []
> pg 0.29 is stuck inactive for 652.741727, current state creating, last
> acting []
> pg 0.28 is stuck inactive for 652.741730, current state creating, last
> acting []
> pg 0.27 is stuck inactive for 652.741732, current state creating, last
> acting []
> pg 0.26 is stuck inactive for 652.741734, current state creating, last
> acting []
> pg 0.3e is stuck inactive for 652.741707, current state creating, last
> acting []
> pg 0.f is stuck inactive for 652.741761, current state creating, last
> acting []
> pg 0.3f is stuck inactive for 652.741708, current state creating, last
> acting []
> pg 0.10 is stuck inactive for 652.741763, current state creating, last
> acting []
> pg 0.4 is stuck inactive for 652.741773, current state creating, last
> acting []
> pg 0.5 is stuck inactive for 652.741774, current state creating, last
> acting []
> pg 0.3a is stuck inactive for 652.741717, current state creating, last
> acting []
> pg 0.b is stuck inactive for 652.741771, current state creating, last
> acting []
> pg 0.c is stuck inactive for 652.741772, current state creating, last
> acting []
> pg 0.3b is stuck inactive for 652.741721, current state creating, last
> acting []
> pg 0.d is stuck inactive for 652.741774, current state creating, last
> acting []
> pg 0.3c is stuck inactive for 652.741722, current state creating, last
> acting []
> pg 0.e is stuck inactive for 652.741776, current state creating, last
> acting []
> pg 0.3d is stuck inactive for 652.741724, current state creating, last
> acting []
> pg 0.22 is stuck inactive for 652.741756, current state creating, last
> acting []
> pg 0.21 is stuck inactive for 652.741758, current state creating, last
> acting []
> pg 0.a is stuck inactive for 652.741783, current state creating, last
> acting []
> pg 0.20 is stuck inactive for 652.741761, current state creating, last
> acting []
> pg 0.9 is stuck inactive for 652.741787, current state creating, last
> acting []
> pg 0.1f is stuck inactive for 652.741764, current state creating, last
> acting []
> pg 0.8 is stuck inactive for 652.741790, current state creating, last
> acting []
> pg 0.7 is stuck inactive for 652.741792, current state creating, last
> acting []
> pg 0.6 is stuck inactive for 652.741794, current state creating, last
> acting []
> pg 0.1e is stuck inactive for 652.741770, current state creating, last
> acting []
> pg 0.1d is stuck inactive for 652.741772, current state creating, last
> acting []
> pg 0.1c is stuck inactive for 652.741774, current state creating, last
> acting []
> pg 0.1b is stuck inactive for 652.741777, current state creating, last
> acting []
> pg 0.1a is stuck inactive for 652.741784, current state creating, last
> acting []
> pg 0.2 is stuck inactive for 652.741812, current state creating, last
> acting []
> pg 0.31 is stuck inactive for 652.741762, current state creating, last
> acting []
> pg 0.19 is stuck inactive for 652.741789, current state creating, last
> acting []
> pg 0.11 is stuck inactive for 652.741797, current state creating, last
> acting []
> pg 0.18 is stuck inactive for 652.741793, current state creating, last
> acting []
> pg 0.1 is stuck inactive for 652.741820, current state creating, last
> acting []
> pg 0.30 is stuck inactive for 652.741769, current state creating, last
> acting []
> pg 0.17 is stuck inactive for 652.741797, current state creating, last
> acting []
> pg 0.0 is stuck inactive for 652.741829, current state creating, last
> acting []
> pg 0.2f is stuck inactive for 652.741774, current state creating, last
> acting []
> pg 0.16 is stuck inactive for 652.741802, current state creating, last
> acting []
> pg 0.12 is stuck inactive for 652.741807, current state creating, last
> acting []
> pg 0.13 is stuck inactive for 652.741807, current state creating, last
> acting []
> pg 0.14 is stuck inactive for 652.741807, current state creating, last
> acting []
> pg 0.15 is stuck inactive for 652.741808, current state creating, last
> acting []
> pg 0.23 is stuck inactive for 652.741792, current state creating, last
> acting []
> pg 0.24 is stuck inactive for 652.741793, current state creating, last
> acting []
> pg 0.25 is stuck inactive for 652.741793, current state creating, last
> acting []
>
> I got 3 OSDs --
>
> ceph osd stat
>      osdmap e8: 3 osds: 3 up, 3 in
>             flags sortbitwise,require_jewel_osds
>
> ceph osd pool ls detail
> pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool
> stripe_width 0
>
> The state inactive seems to be odd for a brand new pool with no data.
>
> This's my ceph.conf --
>
> [global]
> fsid = 8161c91e-dbd2-4491-adf8-74446bef916a
> auth cluster required = cephx
> auth service required = cephx
> auth client required = cephx
> debug = 10/10
> mon host = 10.242.103.139:8567,10.242.103.140:8567,10.242.103.141:8567
> [mon]
> ms bind ipv6 = false
> mon data = ""> > mon addr = 0.0.0.0:8567
> mon warn on legacy crush tunables = true
> mon crush min required version = jewel
> mon initial members = 0,1,2
> keyring = /etc/ceph/mon_keyring
> log file = /var/log/ceph/mon.log
> [osd]
> osd data = ""> > osd journal = /srv/ceph/osd/osd_journal
> osd journal size = 10240
> osd recovery delay start = 10
> osd recovery thread timeout = 60
> osd recovery max active = 1
> osd recovery max chunk = 10485760
> osd max backfills = 2
> osd backfill retry interval = 60
> osd backfill scan min = 100
> osd backfill scan max = 1000
> keyring = /etc/ceph/osd_keyring
>
> The monitors run on the same host as osds.
>
> Any help will be appreciated highly!
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

These are VMs with a Linux bridge for connectivity.

vlan haver been created over teamed interfaces for the primary interface.

The osds can be seen as up and in and there's a quorum, so not a connectivity issue.


ceph:root. I tried ceph:ceph, and also ran ceph-osd as root.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux