-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 What are your iptable rules? - ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Sep 17, 2015 at 1:01 AM, Stefan Eriksson wrote: > hi here is the info, I have added "ceph osd pool set rbd pg_num 128" but that locks up aswell it seems. > > Here are the details your after: > > [cephcluster@ceph01-adm01 ceph-deploy]$ ceph osd pool ls detail > pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 64 last_change 37 flags hashpspool stripe_width 0 > > [cephcluster@ceph01-adm01 ceph-deploy]$ ceph pg dump_stuck > ok > pg_stat state up up_primary acting acting_primary > 0.2d stale+undersized+degraded+peered [0] 0 [0] 0 > 0.2c stale+undersized+degraded+peered [0] 0 [0] 0 > 0.2b stale+undersized+degraded+peered [0] 0 [0] 0 > 0.2a stale+undersized+degraded+peered [0] 0 [0] 0 > 0.29 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.28 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.27 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.26 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.25 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.24 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.23 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.22 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.21 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.20 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.1f stale+undersized+degraded+peered [0] 0 [0] 0 > 0.1e stale+undersized+degraded+peered [0] 0 [0] 0 > 0.1d stale+undersized+degraded+peered [0] 0 [0] 0 > 0.1c stale+undersized+degraded+peered [0] 0 [0] 0 > 0.1b stale+undersized+degraded+peered [0] 0 [0] 0 > 0.1a stale+undersized+degraded+peered [0] 0 [0] 0 > 0.19 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.18 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.17 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.16 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.15 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.14 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.13 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.12 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.11 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.10 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.f stale+undersized+degraded+peered [0] 0 [0] 0 > 0.e stale+undersized+degraded+peered [0] 0 [0] 0 > 0.d stale+undersized+degraded+peered [0] 0 [0] 0 > 0.c stale+undersized+degraded+peered [0] 0 [0] 0 > 0.b stale+undersized+degraded+peered [0] 0 [0] 0 > 0.a stale+undersized+degraded+peered [0] 0 [0] 0 > 0.9 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.8 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.7 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.6 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.5 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.4 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.3 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.2 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.1 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.0 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.7f creating [0,2,1] 0 [0,2,1] 0 > 0.7e creating [2,0,1] 2 [2,0,1] 2 > 0.7d creating [0,2,1] 0 [0,2,1] 0 > 0.7c creating [1,0,2] 1 [1,0,2] 1 > 0.7b creating [0,2,1] 0 [0,2,1] 0 > 0.7a creating [0,2,1] 0 [0,2,1] 0 > 0.79 creating [1,0,2] 1 [1,0,2] 1 > 0.78 creating [1,0,2] 1 [1,0,2] 1 > 0.77 creating [1,0,2] 1 [1,0,2] 1 > 0.76 creating [1,2,0] 1 [1,2,0] 1 > 0.75 creating [1,2,0] 1 [1,2,0] 1 > 0.74 creating [1,2,0] 1 [1,2,0] 1 > 0.73 creating [1,2,0] 1 [1,2,0] 1 > 0.72 creating [0,2,1] 0 [0,2,1] 0 > 0.71 creating [0,2,1] 0 [0,2,1] 0 > 0.70 creating [2,0,1] 2 [2,0,1] 2 > 0.6f creating [2,1,0] 2 [2,1,0] 2 > 0.6e creating [0,1,2] 0 [0,1,2] 0 > 0.6d creating [1,2,0] 1 [1,2,0] 1 > 0.6c creating [2,0,1] 2 [2,0,1] 2 > 0.6b creating [1,2,0] 1 [1,2,0] 1 > 0.6a creating [2,1,0] 2 [2,1,0] 2 > 0.69 creating [2,0,1] 2 [2,0,1] 2 > 0.68 creating [0,1,2] 0 [0,1,2] 0 > 0.67 creating [0,1,2] 0 [0,1,2] 0 > 0.66 creating [0,1,2] 0 [0,1,2] 0 > 0.65 creating [1,0,2] 1 [1,0,2] 1 > 0.64 creating [2,0,1] 2 [2,0,1] 2 > 0.63 creating [1,2,0] 1 [1,2,0] 1 > 0.62 creating [2,1,0] 2 [2,1,0] 2 > 0.61 creating [1,2,0] 1 [1,2,0] 1 > 0.60 creating [1,0,2] 1 [1,0,2] 1 > 0.5f creating [2,0,1] 2 [2,0,1] 2 > 0.5e creating [1,0,2] 1 [1,0,2] 1 > 0.5d creating [1,0,2] 1 [1,0,2] 1 > 0.5c creating [1,2,0] 1 [1,2,0] 1 > 0.5b creating [1,2,0] 1 [1,2,0] 1 > 0.5a creating [1,0,2] 1 [1,0,2] 1 > 0.59 creating [0,2,1] 0 [0,2,1] 0 > 0.58 creating [2,0,1] 2 [2,0,1] 2 > 0.57 creating [0,1,2] 0 [0,1,2] 0 > 0.56 creating [2,1,0] 2 [2,1,0] 2 > 0.55 creating [0,2,1] 0 [0,2,1] 0 > 0.54 creating [0,2,1] 0 [0,2,1] 0 > 0.53 creating [1,2,0] 1 [1,2,0] 1 > 0.52 creating [1,2,0] 1 [1,2,0] 1 > 0.51 creating [1,2,0] 1 [1,2,0] 1 > 0.50 creating [0,2,1] 0 [0,2,1] 0 > 0.4f creating [0,2,1] 0 [0,2,1] 0 > 0.4e creating [0,1,2] 0 [0,1,2] 0 > 0.4d creating [2,1,0] 2 [2,1,0] 2 > 0.4c creating [1,2,0] 1 [1,2,0] 1 > 0.4b creating [0,1,2] 0 [0,1,2] 0 > 0.4a creating [2,1,0] 2 [2,1,0] 2 > 0.49 creating [0,1,2] 0 [0,1,2] 0 > 0.48 creating [1,2,0] 1 [1,2,0] 1 > 0.47 creating [0,2,1] 0 [0,2,1] 0 > 0.46 creating [0,2,1] 0 [0,2,1] 0 > 0.45 creating [2,0,1] 2 [2,0,1] 2 > 0.44 creating [1,2,0] 1 [1,2,0] 1 > 0.43 creating [1,0,2] 1 [1,0,2] 1 > 0.42 creating [1,0,2] 1 [1,0,2] 1 > 0.41 creating [1,2,0] 1 [1,2,0] 1 > 0.40 creating [0,1,2] 0 [0,1,2] 0 > 0.3f stale+undersized+degraded+peered [0] 0 [0] 0 > 0.3e stale+undersized+degraded+peered [0] 0 [0] 0 > 0.3d stale+undersized+degraded+peered [0] 0 [0] 0 > 0.3c stale+undersized+degraded+peered [0] 0 [0] 0 > 0.3b stale+undersized+degraded+peered [0] 0 [0] 0 > 0.3a stale+undersized+degraded+peered [0] 0 [0] 0 > 0.39 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.38 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.37 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.36 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.35 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.34 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.33 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.32 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.31 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.30 stale+undersized+degraded+peered [0] 0 [0] 0 > 0.2f stale+undersized+degraded+peered [0] 0 [0] 0 > 0.2e stale+undersized+degraded+peered [0] 0 [0] 0 > > let me know if you need anything else, I try to debug and see a lot of the below in my ceph-osd.log > > > 015-09-17 08:57:08.723289 7f2b31eda700 20 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f2b31eda700' grace 60 suicide 0 > 2015-09-17 08:57:09.302722 7f2b1f549700 20 heartbeat_map reset_timeout 'OSD::recovery_tp thread 0x7f2b1f549700' grace 60 suicide 0 > 2015-09-17 08:57:09.508026 7f2b3aa73700 20 heartbeat_map is_healthy = healthy > 2015-09-17 08:57:09.722922 7f2b33ede700 20 filestore(/var/lib/ceph/osd/ceph-0) sync_entry woke after 5.000183 > 2015-09-17 08:57:09.722956 7f2b33ede700 20 filestore(/var/lib/ceph/osd/ceph-0) sync_entry waiting for max_interval 5.000000 > 2015-09-17 08:57:09.722987 7f2b326db700 20 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f2b326db700' grace 60 suicide 0 > 2015-09-17 08:57:09.723080 7f2b31eda700 20 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f2b31eda700' grace 60 suicide 0 > 2015-09-17 08:57:09.793157 7f2b1fd4a700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b1fd4a700' grace 15 suicide 150 > 2015-09-17 08:57:09.793169 7f2b1fd4a700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b1fd4a700' grace 4 suicide 0 > 2015-09-17 08:57:09.801912 7f2b2154d700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b2154d700' grace 15 suicide 150 > 2015-09-17 08:57:09.801925 7f2b2154d700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b2154d700' grace 4 suicide 0 > 2015-09-17 08:57:09.828221 7f2b1e547700 20 heartbeat_map reset_timeout 'OSD::command_tp thread 0x7f2b1e547700' grace 60 suicide 0 > 2015-09-17 08:57:09.954625 7f2b24553700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b24553700' grace 15 suicide 150 > 2015-09-17 08:57:09.954646 7f2b24553700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b24553700' grace 4 suicide 0 > 2015-09-17 08:57:09.989839 7f2b21d4e700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b21d4e700' grace 15 suicide 150 > 2015-09-17 08:57:09.989852 7f2b21d4e700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b21d4e700' grace 4 suicide 0 > > > >> 17 sep 2015 kl. 08:11 skrev Goncalo Borges : >> >> Hello Stefan... >> >> Those 64 PGs refer to the default rbd pool which is created. Can you please give us the output of >> >> # ceph osd pool ls detail >> # ceph pg dump_stuck >> >> The degraded / stale status means that the PGs can not be replicated according to your policies. >> >> My guess is that you simply have too few OSDs for the number of replicas you are requesting >> >> Cheers >> G. >> >> >> >> On 09/17/2015 02:59 AM, Stefan Eriksson wrote: >>> I have a completely new cluster for testing and its three servers which all are monitors and hosts for OSD, they each have one disk. >>> The issue is ceph status shows: 64 stale+undersized+degraded+peered >>> >>> health: >>> >>> health HEALTH_WARN >>> clock skew detected on mon.ceph01-osd03 >>> 64 pgs degraded >>> 64 pgs stale >>> 64 pgs stuck degraded >>> 64 pgs stuck inactive >>> 64 pgs stuck stale >>> 64 pgs stuck unclean >>> 64 pgs stuck undersized >>> 64 pgs undersized >>> too few PGs per OSD (21 < min 30) >>> Monitor clock skew detected >>> monmap e1: 3 mons at {ceph01-osd01=192.1.41.51:6789/0,ceph01-osd02=192.1.41.52:6789/0,ceph01-osd03=192.1.41.53:6789/0} >>> election epoch 82, quorum 0,1,2 ceph01-osd01,ceph01-osd02,ceph01-osd03 >>> osdmap e36: 3 osds: 3 up, 3 in >>> pgmap v85: 64 pgs, 1 pools, 0 bytes data, 0 objects >>> 101352 kB used, 8365 GB / 8365 GB avail >>> 64 stale+undersized+degraded+peered >>> >>> >>> ceph osd tree shows: >>> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >>> -1 8.15996 root default >>> -2 2.71999 host ceph01-osd01 >>> 0 2.71999 osd.0 up 1.00000 1.00000 >>> -3 2.71999 host ceph01-osd02 >>> 1 2.71999 osd.1 up 1.00000 1.00000 >>> -4 2.71999 host ceph01-osd03 >>> 2 2.71999 osd.2 up 1.00000 1.00000 >>> >>> >>> >>> >>> >>> Here is my crushmap: >>> >>> # begin crush map >>> tunable choose_local_tries 0 >>> tunable choose_local_fallback_tries 0 >>> tunable choose_total_tries 50 >>> tunable chooseleaf_descend_once 1 >>> tunable straw_calc_version 1 >>> >>> # devices >>> device 0 osd.0 >>> device 1 osd.1 >>> device 2 osd.2 >>> >>> # types >>> type 0 osd >>> type 1 host >>> type 2 chassis >>> type 3 rack >>> type 4 row >>> type 5 pdu >>> type 6 pod >>> type 7 room >>> type 8 datacenter >>> type 9 region >>> type 10 root >>> >>> # buckets >>> host ceph01-osd01 { >>> id -2 # do not change unnecessarily >>> # weight 2.720 >>> alg straw >>> hash 0 # rjenkins1 >>> item osd.0 weight 2.720 >>> } >>> host ceph01-osd02 { >>> id -3 # do not change unnecessarily >>> # weight 2.720 >>> alg straw >>> hash 0 # rjenkins1 >>> item osd.1 weight 2.720 >>> } >>> host ceph01-osd03 { >>> id -4 # do not change unnecessarily >>> # weight 2.720 >>> alg straw >>> hash 0 # rjenkins1 >>> item osd.2 weight 2.720 >>> } >>> root default { >>> id -1 # do not change unnecessarily >>> # weight 8.160 >>> alg straw >>> hash 0 # rjenkins1 >>> item ceph01-osd01 weight 2.720 >>> item ceph01-osd02 weight 2.720 >>> item ceph01-osd03 weight 2.720 >>> } >>> >>> # rules >>> rule replicated_ruleset { >>> ruleset 0 >>> type replicated >>> min_size 1 >>> max_size 10 >>> step take default >>> step chooseleaf firstn 0 type host >>> step emit >>> } >>> >>> # end crush map >>> >>> And the ceph.conf which is shared among all nodes: >>> >>> ceph.conf >>> [global] >>> fsid = b9043917-5f65-98d5-8624-ee12ff32a5ea >>> public_network = 192.1.41.0/24 >>> cluster_network = 192.168.0.0/24 >>> mon_initial_members = ceph01-osd01, ceph01-osd02, ceph01-osd03 >>> mon_host = 192.1.41.51,192.1.41.52,192.1.41.53 >>> auth_cluster_required = cephx >>> auth_service_required = cephx >>> auth_client_required = cephx >>> filestore_xattr_use_omap = true >>> osd pool default pg num = 512 >>> osd pool default pgp num = 512 >>> >>> Logs doesnt say much, the only active log which adds something is: >>> >>> mon.ceph01-osd01@0(leader).data_health(82) update_stats avail 88% total 9990 MB, used 1170 MB, avail 8819 MB >>> mon.ceph01-osd02@1(peon).data_health(82) update_stats avail 88% total 9990 MB, used 1171 MB, avail 8818 MB >>> mon.ceph01-osd03@2(peon).data_health(82) update_stats avail 88% total 9990 MB, used 1172 MB, avail 8817 MB >>> >>> Does anyone have a thoughts of what might be wrong? Or if there is other info I can provide to ease the search for what it might be? >>> >>> Thanks! >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> -- >> Goncalo Borges >> Research Computing >> ARC Centre of Excellence for Particle Physics at the Terascale >> School of Physics A28 | University of Sydney, NSW 2006 >> T: +61 2 93511937 >> > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -----BEGIN PGP SIGNATURE----- Version: Mailvelope v1.0.2 Comment: https://www.mailvelope.com wsFcBAEBCAAQBQJV+ttICRDmVDuy+mK58QAAIcUQAJlV3a9tehS4nRil+31Y YBClyPT51LEVEflSfE4Vpv7EDEjm0cvbJi+zFAOMpUpnj3+2Qu24NLcuZ02k HN9n3glETNN2Mazp5F0UZEXqv8e8lYRe5Dg+/IC1lktoFotoYmpuMwBLKVpr sBeg12t/9v0WZmcY5nWvymzKC7TJQTTJ2+TTOSJ+1sWwSo9coUEyf6mjmTzi KMi0g7NYiqHS00xskOQOuMK2acgbAdAWr2fRDQjPZq709MWCAQXq9mVu2z+k fwXapDUct00ljcGHBBgUxXNVK8Rsbz4wK4BC1+C42EuiofPWQWrq06s+hAPa Fi/0NtYuiqwhzAacKJ2ChgBeZIv11t1i8k3BgohFgX2Kd/1/joFrMNkSCjJ6 VbamCkKvqTBsj/TdS8A9XMd8FesGpTiICNxzJnwByWfasAUANbjdzu40xo/D ZoPEv1t0odxYhXGvfLTUFSw0AmqFMG4xaRuu2O/9b8cQG/QPi83Dadg8gkLF U2mrEMmCgFU5D5HmqUjAnY1hMZi6QKlcyM5Ym2frCezYyV9CBaVymFqtYxPB kSjYjOdyoTKNLm9Y0j+uyPkqrNAnMLgySprxy4vEeJhst16e/pgITx4OMXMn DgfNTGpCkUQdVJWFzk5brTdb7mNmomC++bUowXCCd57kR60aKhbsX3HuklVk MvlQ =EGoq -----END PGP SIGNATURE----- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com