Hi Greg, Thanks for helping to take a look. Please find your requested outputs below. ceph osd tree: # id weight type name up/down reweight -1 0 root default -2 0 host osd1 0 0 osd.0 up 1 4 0 osd.4 up 1 8 0 osd.8 up 1 11 0 osd.11 up 1 -3 0 host osd0 1 0 osd.1 up 1 3 0 osd.3 up 1 6 0 osd.6 up 1 9 0 osd.9 up 1 -4 0 host osd2 2 0 osd.2 up 1 5 0 osd.5 up 1 7 0 osd.7 up 1 10 0 osd.10 up 1 ceph -s: cluster 4a158d27-f750-41d5-9e7f-26ce4c9d2d45 health HEALTH_WARN 832 pgs degraded; 832 pgs stuck unclean; recovery 43/86 objects degraded (50.000%) monmap e1: 1 mons at {ceph-mon0=192.168.2.10:6789/0}, election epoch 2, quorum 0 ceph-mon0 osdmap e34: 12 osds: 12 up, 12 in pgmap v61: 832 pgs, 8 pools, 840 bytes data, 43 objects 403 MB used, 10343 MB / 10747 MB avail 43/86 objects degraded (50.000%) 832 active+degraded Thanks, Ripal On Aug 25, 2014, at 12:45 PM, Gregory Farnum <greg at inktank.com> wrote: > What's the output of "ceph osd tree"? And the full output of "ceph -s"? > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Mon, Aug 18, 2014 at 8:07 PM, Ripal Nathuji <ripal at nathuji.com> wrote: >> Hi folks, >> >> I've come across an issue which I found a "fix" for, but I'm not sure >> whether it's correct or if there is some other misconfiguration on my end >> and this is merely a symptom. I'd appreciate any insights anyone could >> provide based on the information below, and happy to provide more details as >> necessary. >> >> Summary: A fresh install of Ceph 0.80.5 comes up with all pgs marked as >> active+degraded. This reproduces on 12.04 as well as CentOS 7 with a varying >> number of OSD hosts (1, 2, 3), where each OSD host has four storage drives. >> The configuration file defines a default replica size of 2, and allows leafs >> of type 0. Specific snippet: >> >> [global] >> ... >> osd pool default size = 2 >> osd crush chooseleaf type = 0 >> >> >> I verified the crush rules were as expected: >> >> "rules": [ >> { "rule_id": 0, >> "rule_name": "replicated_ruleset", >> "ruleset": 0, >> "type": 1, >> "min_size": 1, >> "max_size": 10, >> "steps": [ >> { "op": "take", >> "item": -1, >> "item_name": "default"}, >> { "op": "choose_firstn", >> "num": 0, >> "type": "osd"}, >> { "op": "emit"}]}], >> >> >> Inspecting the pg dump I observed that all pgs had a single osd in the >> up/acting sets. That seemed to explain why the pgs were degraded, but it was >> unclear to me why a second OSD wasn't in the set. After trying a variety of >> things, I noticed that there was a difference between Emperor (which works >> fine in these configurations) and Firefly with the default tunables, where >> Firefly comes up with the bobtail profile. The setting >> choose_local_fallback_tries is 0 in this profile while it used to default to >> 5 on Emperor. Sure enough, if I modify my crush map and set the parameter to >> a non-zero value, the cluster remaps and goes healthy with all pgs >> active+clean. >> >> The documentation states the optimal value of choose_local_fallback_tries is >> 0 for FF, so I'd like to get a better understanding of this parameter and >> why modifying the default value moves the pgs to a clean state in my >> scenarios. >> >> Thanks, >> Ripal >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140825/a0be0d26/attachment.htm>