What's the output of "ceph osd tree"? And the full output of "ceph -s"? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Aug 18, 2014 at 8:07 PM, Ripal Nathuji <ripal at nathuji.com> wrote: > Hi folks, > > I've come across an issue which I found a "fix" for, but I'm not sure > whether it's correct or if there is some other misconfiguration on my end > and this is merely a symptom. I'd appreciate any insights anyone could > provide based on the information below, and happy to provide more details as > necessary. > > Summary: A fresh install of Ceph 0.80.5 comes up with all pgs marked as > active+degraded. This reproduces on 12.04 as well as CentOS 7 with a varying > number of OSD hosts (1, 2, 3), where each OSD host has four storage drives. > The configuration file defines a default replica size of 2, and allows leafs > of type 0. Specific snippet: > > [global] > ... > osd pool default size = 2 > osd crush chooseleaf type = 0 > > > I verified the crush rules were as expected: > > "rules": [ > { "rule_id": 0, > "rule_name": "replicated_ruleset", > "ruleset": 0, > "type": 1, > "min_size": 1, > "max_size": 10, > "steps": [ > { "op": "take", > "item": -1, > "item_name": "default"}, > { "op": "choose_firstn", > "num": 0, > "type": "osd"}, > { "op": "emit"}]}], > > > Inspecting the pg dump I observed that all pgs had a single osd in the > up/acting sets. That seemed to explain why the pgs were degraded, but it was > unclear to me why a second OSD wasn't in the set. After trying a variety of > things, I noticed that there was a difference between Emperor (which works > fine in these configurations) and Firefly with the default tunables, where > Firefly comes up with the bobtail profile. The setting > choose_local_fallback_tries is 0 in this profile while it used to default to > 5 on Emperor. Sure enough, if I modify my crush map and set the parameter to > a non-zero value, the cluster remaps and goes healthy with all pgs > active+clean. > > The documentation states the optimal value of choose_local_fallback_tries is > 0 for FF, so I'd like to get a better understanding of this parameter and > why modifying the default value moves the pgs to a clean state in my > scenarios. > > Thanks, > Ripal > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >