Hi, Under normal conditions one OSD on one host is not enough to get a cluster healthy. You'd need a minimum of one OSD on three hosts to get clean. Your OSD dump shows "*replicated size 3 min_size 2*" so that's healthy at 3 copies of data, not healthy at two copies but still usable, cluster stops accepting data at one accessible copy. -Michael On 27/05/2014 18:38, Sudarsan, Rajesh wrote: > > I am seeing the same error message with ceph health command. I am > using Ubuntu 14.04 with ceph 0.79. I am using the ceph distribution > that comes with the Ubuntu release. My configuration is > > 1 x mon > > 1x OSD > > Both the OSD and mon are on the same host. > > rsudarsa at rsudarsa-ce1:~/mycluster$ ceph -s > > cluster 5330b56b-bfbb-4ff8-aeb8-138233c2bd9a > > health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs > stuck unclean > > monmap e1: 1 mons at {rsudarsa-ce2=192.168.252.196:6789/0}, election > epoch 2, quorum 0 rsudarsa-ce2 > > osdmap e4: 1 osds: 1 up, 1 in > > pgmap v12: 192 pgs, 3 pools, 0 bytes data, 0 objects > > 6603 MB used, 856 GB / 908 GB avail > > 192 incomplete > > rsudarsa at rsudarsa-ce1:~/mycluster$ ceph osd tree > > # id weight type name up/down reweight > > -1 0.89 root default > > -2 0.89 host rsudarsa-ce2 > > 0 0.89 osd.0 up 1 > > rsudarsa at rsudarsa-ce1:~/mycluster$ ceph osd dump > > epoch 4 > > fsid 5330b56b-bfbb-4ff8-aeb8-138233c2bd9a > > created 2014-05-27 10:11:33.995272 > > modified 2014-05-27 10:13:34.157068 > > flags > > pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool > crash_replay_interval 45 stripe_width 0 > > pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 > object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags > hashpspool stripe_width 0 > > pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool > stripe_width 0 > > max_osd 1 > > osd.0 up in weight 1 up_from 4 up_thru 0 down_at 0 > last_clean_interval [0,0) 192.168.252.196:6800/7071 > 192.168.252.196:6801/7071 192.168.252.196:6802/7071 > 192.168.252.196:6803/7071 exists,up 8b1c2bbb-b2f0-4974-b0f5-266c558cc732 > > *From:*ceph-users [mailto:ceph-users-bounces at lists.ceph.com] *On > Behalf Of *jan.zeller at id.unibe.ch > *Sent:* Friday, May 23, 2014 6:31 AM > *To:* michael at onlinefusion.co.uk; ceph-users at lists.ceph.com > *Subject:* Re: pgs incomplete; pgs stuck inactive; pgs > stuck unclean > > Thanks for your tips & tricks. > > This setup is now based on ubuntu 12.04, ceph version 0.80.1 > > Still using > > 1 x mon > > 3 x osds > > root at ceph-node2:~# ceph osd tree > > # id weight type name up/down reweight > > -1 0 root default > > -2 0 host ceph-node2 > > 0 0 osd.0 up 1 > > -3 0 host ceph-node3 > > 1 0 osd.1 up 1 > > -4 0 host ceph-node1 > > 2 0 osd.2 up 1 > > root at ceph-node2:~# ceph -s > > cluster c30e1410-fe1a-4924-9112-c7a5d789d273 > > health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; > 192 pgs stuck unclean > > monmap e1: 1 mons at {ceph-node1=192.168.123.48:6789/0}, election > epoch 2, quorum 0 ceph-node1 > > osdmap e11: 3 osds: 3 up, 3 in > > pgmap v18: 192 pgs, 3 pools, 0 bytes data, 0 objects > > 102 MB used, 15224 MB / 15326 MB avail > > 192 incomplete > > root at ceph-node2:~# cat mycrushmap.txt > > # begin crush map > > tunable choose_local_tries 0 > > tunable choose_local_fallback_tries 0 > > tunable choose_total_tries 50 > > tunable chooseleaf_descend_once 1 > > # devices > > device 0 osd.0 > > device 1 osd.1 > > device 2 osd.2 > > # types > > type 0 osd > > type 1 host > > type 2 chassis > > type 3 rack > > type 4 row > > type 5 pdu > > type 6 pod > > type 7 room > > type 8 datacenter > > type 9 region > > type 10 root > > # buckets > > host ceph-node2 { > > id -2 # do not change unnecessarily > > # weight 0.000 > > alg straw > > hash 0 # rjenkins1 > > item osd.0 weight 0.000 > > } > > host ceph-node3 { > > id -3 # do not change unnecessarily > > # weight 0.000 > > alg straw > > hash 0 # rjenkins1 > > item osd.1 weight 0.000 > > } > > host ceph-node1 { > > id -4 # do not change unnecessarily > > # weight 0.000 > > alg straw > > hash 0 # rjenkins1 > > item osd.2 weight 0.000 > > } > > root default { > > id -1 # do not change unnecessarily > > # weight 0.000 > > alg straw > > hash 0 # rjenkins1 > > item ceph-node2 weight 0.000 > > item ceph-node3 weight 0.000 > > item ceph-node1 weight 0.000 > > } > > # rules > > rule replicated_ruleset { > > ruleset 0 > > type replicated > > min_size 1 > > max_size 10 > > step take default > > step chooseleaf firstn 0 type host > > step emit > > } > > # end crush map > > Is there anything wrong with it ? > > root at ceph-node2:~# ceph osd dump > > epoch 11 > > fsid c30e1410-fe1a-4924-9112-c7a5d789d273 > > created 2014-05-23 15:16:57.772981 > > modified 2014-05-23 15:18:17.022152 > > flags > > pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool > crash_replay_interval 45 stripe_width 0 > > pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 > object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags > hashpspool stripe_width 0 > > pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool > stripe_width 0 max_osd 3 > > osd.0 up in weight 1 up_from 4 up_thru 5 down_at 0 > last_clean_interval [0,0) 192.168.123.49:6800/4714 > 192.168.123.49:6801/4714 192.168.123.49:6802/4714 > 192.168.123.49:6803/4714 exists,up bc991a4b-9e60-4759-b35a-7f58852aa804 > > osd.1 up in weight 1 up_from 8 up_thru 0 down_at 0 > last_clean_interval [0,0) 192.168.123.50:6800/4685 > 192.168.123.50:6801/4685 192.168.123.50:6802/4685 > 192.168.123.50:6803/4685 exists,up bd099d83-2483-42b9-9dbc-7f4e4043ca60 > > osd.2 up in weight 1 up_from 11 up_thru 0 down_at 0 > last_clean_interval [0,0) 192.168.123.53:6800/16807 > 192.168.123.53:6801/16807 192.168.123.53:6802/16807 > 192.168.123.53:6803/16807 exists,up 80a302d0-3493-4c39-b34b-5af233b32ba1 > > thanks > > *Von:*ceph-users [mailto:ceph-users-bounces at lists.ceph.com] *Im > Auftrag von *Michael > *Gesendet:* Freitag, 23. Mai 2014 12:36 > *An:* ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com> > *Betreff:* Re: pgs incomplete; pgs stuck inactive; pgs > stuck unclean > > 64 PG's per pool /shouldn't/ cause any issues while there's only 3 > OSD's. It'll be something to pay attention to if a lot more get added > through. > > Your replication setup is probably anything other than host. > You'll want to extract your crush map then decompile it and see if > your "step" is set to osd or rack. > If it's not host then change it to that and pull it in again. > > Check the docs on crush maps > http://ceph.com/docs/master/rados/operations/crush-map/ for more info. > > -Michael > > On 23/05/2014 10:53, Karan Singh wrote: > > Try increasing the placement groups for pools > > ceph osd pool set data pg_num 128 > > ceph osd pool set data pgp_num 128 > > similarly for other 2 pools as well. > > - karan - > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140527/413327a8/attachment.htm>