hi here is the info, I have added "ceph osd pool set rbd pg_num 128" but that locks up aswell it seems. Here are the details your after: [cephcluster@ceph01-adm01 ceph-deploy]$ ceph osd pool ls detail pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 64 last_change 37 flags hashpspool stripe_width 0 [cephcluster@ceph01-adm01 ceph-deploy]$ ceph pg dump_stuck ok pg_stat state up up_primary acting acting_primary 0.2d stale+undersized+degraded+peered [0] 0 [0] 0 0.2c stale+undersized+degraded+peered [0] 0 [0] 0 0.2b stale+undersized+degraded+peered [0] 0 [0] 0 0.2a stale+undersized+degraded+peered [0] 0 [0] 0 0.29 stale+undersized+degraded+peered [0] 0 [0] 0 0.28 stale+undersized+degraded+peered [0] 0 [0] 0 0.27 stale+undersized+degraded+peered [0] 0 [0] 0 0.26 stale+undersized+degraded+peered [0] 0 [0] 0 0.25 stale+undersized+degraded+peered [0] 0 [0] 0 0.24 stale+undersized+degraded+peered [0] 0 [0] 0 0.23 stale+undersized+degraded+peered [0] 0 [0] 0 0.22 stale+undersized+degraded+peered [0] 0 [0] 0 0.21 stale+undersized+degraded+peered [0] 0 [0] 0 0.20 stale+undersized+degraded+peered [0] 0 [0] 0 0.1f stale+undersized+degraded+peered [0] 0 [0] 0 0.1e stale+undersized+degraded+peered [0] 0 [0] 0 0.1d stale+undersized+degraded+peered [0] 0 [0] 0 0.1c stale+undersized+degraded+peered [0] 0 [0] 0 0.1b stale+undersized+degraded+peered [0] 0 [0] 0 0.1a stale+undersized+degraded+peered [0] 0 [0] 0 0.19 stale+undersized+degraded+peered [0] 0 [0] 0 0.18 stale+undersized+degraded+peered [0] 0 [0] 0 0.17 stale+undersized+degraded+peered [0] 0 [0] 0 0.16 stale+undersized+degraded+peered [0] 0 [0] 0 0.15 stale+undersized+degraded+peered [0] 0 [0] 0 0.14 stale+undersized+degraded+peered [0] 0 [0] 0 0.13 stale+undersized+degraded+peered [0] 0 [0] 0 0.12 stale+undersized+degraded+peered [0] 0 [0] 0 0.11 stale+undersized+degraded+peered [0] 0 [0] 0 0.10 stale+undersized+degraded+peered [0] 0 [0] 0 0.f stale+undersized+degraded+peered [0] 0 [0] 0 0.e stale+undersized+degraded+peered [0] 0 [0] 0 0.d stale+undersized+degraded+peered [0] 0 [0] 0 0.c stale+undersized+degraded+peered [0] 0 [0] 0 0.b stale+undersized+degraded+peered [0] 0 [0] 0 0.a stale+undersized+degraded+peered [0] 0 [0] 0 0.9 stale+undersized+degraded+peered [0] 0 [0] 0 0.8 stale+undersized+degraded+peered [0] 0 [0] 0 0.7 stale+undersized+degraded+peered [0] 0 [0] 0 0.6 stale+undersized+degraded+peered [0] 0 [0] 0 0.5 stale+undersized+degraded+peered [0] 0 [0] 0 0.4 stale+undersized+degraded+peered [0] 0 [0] 0 0.3 stale+undersized+degraded+peered [0] 0 [0] 0 0.2 stale+undersized+degraded+peered [0] 0 [0] 0 0.1 stale+undersized+degraded+peered [0] 0 [0] 0 0.0 stale+undersized+degraded+peered [0] 0 [0] 0 0.7f creating [0,2,1] 0 [0,2,1] 0 0.7e creating [2,0,1] 2 [2,0,1] 2 0.7d creating [0,2,1] 0 [0,2,1] 0 0.7c creating [1,0,2] 1 [1,0,2] 1 0.7b creating [0,2,1] 0 [0,2,1] 0 0.7a creating [0,2,1] 0 [0,2,1] 0 0.79 creating [1,0,2] 1 [1,0,2] 1 0.78 creating [1,0,2] 1 [1,0,2] 1 0.77 creating [1,0,2] 1 [1,0,2] 1 0.76 creating [1,2,0] 1 [1,2,0] 1 0.75 creating [1,2,0] 1 [1,2,0] 1 0.74 creating [1,2,0] 1 [1,2,0] 1 0.73 creating [1,2,0] 1 [1,2,0] 1 0.72 creating [0,2,1] 0 [0,2,1] 0 0.71 creating [0,2,1] 0 [0,2,1] 0 0.70 creating [2,0,1] 2 [2,0,1] 2 0.6f creating [2,1,0] 2 [2,1,0] 2 0.6e creating [0,1,2] 0 [0,1,2] 0 0.6d creating [1,2,0] 1 [1,2,0] 1 0.6c creating [2,0,1] 2 [2,0,1] 2 0.6b creating [1,2,0] 1 [1,2,0] 1 0.6a creating [2,1,0] 2 [2,1,0] 2 0.69 creating [2,0,1] 2 [2,0,1] 2 0.68 creating [0,1,2] 0 [0,1,2] 0 0.67 creating [0,1,2] 0 [0,1,2] 0 0.66 creating [0,1,2] 0 [0,1,2] 0 0.65 creating [1,0,2] 1 [1,0,2] 1 0.64 creating [2,0,1] 2 [2,0,1] 2 0.63 creating [1,2,0] 1 [1,2,0] 1 0.62 creating [2,1,0] 2 [2,1,0] 2 0.61 creating [1,2,0] 1 [1,2,0] 1 0.60 creating [1,0,2] 1 [1,0,2] 1 0.5f creating [2,0,1] 2 [2,0,1] 2 0.5e creating [1,0,2] 1 [1,0,2] 1 0.5d creating [1,0,2] 1 [1,0,2] 1 0.5c creating [1,2,0] 1 [1,2,0] 1 0.5b creating [1,2,0] 1 [1,2,0] 1 0.5a creating [1,0,2] 1 [1,0,2] 1 0.59 creating [0,2,1] 0 [0,2,1] 0 0.58 creating [2,0,1] 2 [2,0,1] 2 0.57 creating [0,1,2] 0 [0,1,2] 0 0.56 creating [2,1,0] 2 [2,1,0] 2 0.55 creating [0,2,1] 0 [0,2,1] 0 0.54 creating [0,2,1] 0 [0,2,1] 0 0.53 creating [1,2,0] 1 [1,2,0] 1 0.52 creating [1,2,0] 1 [1,2,0] 1 0.51 creating [1,2,0] 1 [1,2,0] 1 0.50 creating [0,2,1] 0 [0,2,1] 0 0.4f creating [0,2,1] 0 [0,2,1] 0 0.4e creating [0,1,2] 0 [0,1,2] 0 0.4d creating [2,1,0] 2 [2,1,0] 2 0.4c creating [1,2,0] 1 [1,2,0] 1 0.4b creating [0,1,2] 0 [0,1,2] 0 0.4a creating [2,1,0] 2 [2,1,0] 2 0.49 creating [0,1,2] 0 [0,1,2] 0 0.48 creating [1,2,0] 1 [1,2,0] 1 0.47 creating [0,2,1] 0 [0,2,1] 0 0.46 creating [0,2,1] 0 [0,2,1] 0 0.45 creating [2,0,1] 2 [2,0,1] 2 0.44 creating [1,2,0] 1 [1,2,0] 1 0.43 creating [1,0,2] 1 [1,0,2] 1 0.42 creating [1,0,2] 1 [1,0,2] 1 0.41 creating [1,2,0] 1 [1,2,0] 1 0.40 creating [0,1,2] 0 [0,1,2] 0 0.3f stale+undersized+degraded+peered [0] 0 [0] 0 0.3e stale+undersized+degraded+peered [0] 0 [0] 0 0.3d stale+undersized+degraded+peered [0] 0 [0] 0 0.3c stale+undersized+degraded+peered [0] 0 [0] 0 0.3b stale+undersized+degraded+peered [0] 0 [0] 0 0.3a stale+undersized+degraded+peered [0] 0 [0] 0 0.39 stale+undersized+degraded+peered [0] 0 [0] 0 0.38 stale+undersized+degraded+peered [0] 0 [0] 0 0.37 stale+undersized+degraded+peered [0] 0 [0] 0 0.36 stale+undersized+degraded+peered [0] 0 [0] 0 0.35 stale+undersized+degraded+peered [0] 0 [0] 0 0.34 stale+undersized+degraded+peered [0] 0 [0] 0 0.33 stale+undersized+degraded+peered [0] 0 [0] 0 0.32 stale+undersized+degraded+peered [0] 0 [0] 0 0.31 stale+undersized+degraded+peered [0] 0 [0] 0 0.30 stale+undersized+degraded+peered [0] 0 [0] 0 0.2f stale+undersized+degraded+peered [0] 0 [0] 0 0.2e stale+undersized+degraded+peered [0] 0 [0] 0 let me know if you need anything else, I try to debug and see a lot of the below in my ceph-osd.log 015-09-17 08:57:08.723289 7f2b31eda700 20 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f2b31eda700' grace 60 suicide 0 2015-09-17 08:57:09.302722 7f2b1f549700 20 heartbeat_map reset_timeout 'OSD::recovery_tp thread 0x7f2b1f549700' grace 60 suicide 0 2015-09-17 08:57:09.508026 7f2b3aa73700 20 heartbeat_map is_healthy = healthy 2015-09-17 08:57:09.722922 7f2b33ede700 20 filestore(/var/lib/ceph/osd/ceph-0) sync_entry woke after 5.000183 2015-09-17 08:57:09.722956 7f2b33ede700 20 filestore(/var/lib/ceph/osd/ceph-0) sync_entry waiting for max_interval 5.000000 2015-09-17 08:57:09.722987 7f2b326db700 20 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f2b326db700' grace 60 suicide 0 2015-09-17 08:57:09.723080 7f2b31eda700 20 heartbeat_map reset_timeout 'FileStore::op_tp thread 0x7f2b31eda700' grace 60 suicide 0 2015-09-17 08:57:09.793157 7f2b1fd4a700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b1fd4a700' grace 15 suicide 150 2015-09-17 08:57:09.793169 7f2b1fd4a700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b1fd4a700' grace 4 suicide 0 2015-09-17 08:57:09.801912 7f2b2154d700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b2154d700' grace 15 suicide 150 2015-09-17 08:57:09.801925 7f2b2154d700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b2154d700' grace 4 suicide 0 2015-09-17 08:57:09.828221 7f2b1e547700 20 heartbeat_map reset_timeout 'OSD::command_tp thread 0x7f2b1e547700' grace 60 suicide 0 2015-09-17 08:57:09.954625 7f2b24553700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b24553700' grace 15 suicide 150 2015-09-17 08:57:09.954646 7f2b24553700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b24553700' grace 4 suicide 0 2015-09-17 08:57:09.989839 7f2b21d4e700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b21d4e700' grace 15 suicide 150 2015-09-17 08:57:09.989852 7f2b21d4e700 20 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f2b21d4e700' grace 4 suicide 0 > 17 sep 2015 kl. 08:11 skrev Goncalo Borges <goncalo@xxxxxxxxxxxxxxxxxxx>: > > Hello Stefan... > > Those 64 PGs refer to the default rbd pool which is created. Can you please give us the output of > > # ceph osd pool ls detail > # ceph pg dump_stuck > > The degraded / stale status means that the PGs can not be replicated according to your policies. > > My guess is that you simply have too few OSDs for the number of replicas you are requesting > > Cheers > G. > > > > On 09/17/2015 02:59 AM, Stefan Eriksson wrote: >> I have a completely new cluster for testing and its three servers which all are monitors and hosts for OSD, they each have one disk. >> The issue is ceph status shows: 64 stale+undersized+degraded+peered >> >> health: >> >> health HEALTH_WARN >> clock skew detected on mon.ceph01-osd03 >> 64 pgs degraded >> 64 pgs stale >> 64 pgs stuck degraded >> 64 pgs stuck inactive >> 64 pgs stuck stale >> 64 pgs stuck unclean >> 64 pgs stuck undersized >> 64 pgs undersized >> too few PGs per OSD (21 < min 30) >> Monitor clock skew detected >> monmap e1: 3 mons at {ceph01-osd01=192.1.41.51:6789/0,ceph01-osd02=192.1.41.52:6789/0,ceph01-osd03=192.1.41.53:6789/0} >> election epoch 82, quorum 0,1,2 ceph01-osd01,ceph01-osd02,ceph01-osd03 >> osdmap e36: 3 osds: 3 up, 3 in >> pgmap v85: 64 pgs, 1 pools, 0 bytes data, 0 objects >> 101352 kB used, 8365 GB / 8365 GB avail >> 64 stale+undersized+degraded+peered >> >> >> ceph osd tree shows: >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -1 8.15996 root default >> -2 2.71999 host ceph01-osd01 >> 0 2.71999 osd.0 up 1.00000 1.00000 >> -3 2.71999 host ceph01-osd02 >> 1 2.71999 osd.1 up 1.00000 1.00000 >> -4 2.71999 host ceph01-osd03 >> 2 2.71999 osd.2 up 1.00000 1.00000 >> >> >> >> >> >> Here is my crushmap: >> >> # begin crush map >> tunable choose_local_tries 0 >> tunable choose_local_fallback_tries 0 >> tunable choose_total_tries 50 >> tunable chooseleaf_descend_once 1 >> tunable straw_calc_version 1 >> >> # devices >> device 0 osd.0 >> device 1 osd.1 >> device 2 osd.2 >> >> # types >> type 0 osd >> type 1 host >> type 2 chassis >> type 3 rack >> type 4 row >> type 5 pdu >> type 6 pod >> type 7 room >> type 8 datacenter >> type 9 region >> type 10 root >> >> # buckets >> host ceph01-osd01 { >> id -2 # do not change unnecessarily >> # weight 2.720 >> alg straw >> hash 0 # rjenkins1 >> item osd.0 weight 2.720 >> } >> host ceph01-osd02 { >> id -3 # do not change unnecessarily >> # weight 2.720 >> alg straw >> hash 0 # rjenkins1 >> item osd.1 weight 2.720 >> } >> host ceph01-osd03 { >> id -4 # do not change unnecessarily >> # weight 2.720 >> alg straw >> hash 0 # rjenkins1 >> item osd.2 weight 2.720 >> } >> root default { >> id -1 # do not change unnecessarily >> # weight 8.160 >> alg straw >> hash 0 # rjenkins1 >> item ceph01-osd01 weight 2.720 >> item ceph01-osd02 weight 2.720 >> item ceph01-osd03 weight 2.720 >> } >> >> # rules >> rule replicated_ruleset { >> ruleset 0 >> type replicated >> min_size 1 >> max_size 10 >> step take default >> step chooseleaf firstn 0 type host >> step emit >> } >> >> # end crush map >> >> And the ceph.conf which is shared among all nodes: >> >> ceph.conf >> [global] >> fsid = b9043917-5f65-98d5-8624-ee12ff32a5ea >> public_network = 192.1.41.0/24 >> cluster_network = 192.168.0.0/24 >> mon_initial_members = ceph01-osd01, ceph01-osd02, ceph01-osd03 >> mon_host = 192.1.41.51,192.1.41.52,192.1.41.53 >> auth_cluster_required = cephx >> auth_service_required = cephx >> auth_client_required = cephx >> filestore_xattr_use_omap = true >> osd pool default pg num = 512 >> osd pool default pgp num = 512 >> >> Logs doesnt say much, the only active log which adds something is: >> >> mon.ceph01-osd01@0(leader).data_health(82) update_stats avail 88% total 9990 MB, used 1170 MB, avail 8819 MB >> mon.ceph01-osd02@1(peon).data_health(82) update_stats avail 88% total 9990 MB, used 1171 MB, avail 8818 MB >> mon.ceph01-osd03@2(peon).data_health(82) update_stats avail 88% total 9990 MB, used 1172 MB, avail 8817 MB >> >> Does anyone have a thoughts of what might be wrong? Or if there is other info I can provide to ease the search for what it might be? >> >> Thanks! >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- > Goncalo Borges > Research Computing > ARC Centre of Excellence for Particle Physics at the Terascale > School of Physics A28 | University of Sydney, NSW 2006 > T: +61 2 93511937 > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com