Fresh Firefly install degraded without modified default tunables

ripal@xxxxxxxxxxx (Ripal Nathuji) · Mon, 25 Aug 2014 14:59:52 -0500

Hi Greg,

Thanks for helping to take a look. Please find your requested outputs below.

ceph osd tree:

# id	weight	type name	up/down	reweight
-1	0	root default
-2	0		host osd1
0	0			osd.0	up	1	
4	0			osd.4	up	1	
8	0			osd.8	up	1	
11	0			osd.11	up	1	
-3	0		host osd0
1	0			osd.1	up	1	
3	0			osd.3	up	1	
6	0			osd.6	up	1	
9	0			osd.9	up	1	
-4	0		host osd2
2	0			osd.2	up	1	
5	0			osd.5	up	1	
7	0			osd.7	up	1	
10	0			osd.10	up	1

ceph -s:

    cluster 4a158d27-f750-41d5-9e7f-26ce4c9d2d45
     health HEALTH_WARN 832 pgs degraded; 832 pgs stuck unclean; recovery 43/86 objects degraded (50.000%)
     monmap e1: 1 mons at {ceph-mon0=192.168.2.10:6789/0}, election epoch 2, quorum 0 ceph-mon0
     osdmap e34: 12 osds: 12 up, 12 in
      pgmap v61: 832 pgs, 8 pools, 840 bytes data, 43 objects
            403 MB used, 10343 MB / 10747 MB avail
            43/86 objects degraded (50.000%)
                 832 active+degraded

Thanks,
Ripal

On Aug 25, 2014, at 12:45 PM, Gregory Farnum <greg at inktank.com> wrote:

> What's the output of "ceph osd tree"? And the full output of "ceph -s"?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> 
> 
> On Mon, Aug 18, 2014 at 8:07 PM, Ripal Nathuji <ripal at nathuji.com> wrote:
>> Hi folks,
>> 
>> I've come across an issue which I found a "fix" for, but I'm not sure
>> whether it's correct or if there is some other misconfiguration on my end
>> and this is merely a symptom. I'd appreciate any insights anyone could
>> provide based on the information below, and happy to provide more details as
>> necessary.
>> 
>> Summary: A fresh install of Ceph 0.80.5 comes up with all pgs marked as
>> active+degraded. This reproduces on 12.04 as well as CentOS 7 with a varying
>> number of OSD hosts (1, 2, 3), where each OSD host has four storage drives.
>> The configuration file defines a default replica size of 2, and allows leafs
>> of type 0. Specific snippet:
>> 
>> [global]
>>  ...
>>  osd pool default size = 2
>>  osd crush chooseleaf type = 0
>> 
>> 
>> I verified the crush rules were as expected:
>> 
>>  "rules": [
>>        { "rule_id": 0,
>>          "rule_name": "replicated_ruleset",
>>          "ruleset": 0,
>>          "type": 1,
>>          "min_size": 1,
>>          "max_size": 10,
>>          "steps": [
>>                { "op": "take",
>>                  "item": -1,
>>                  "item_name": "default"},
>>                { "op": "choose_firstn",
>>                  "num": 0,
>>                  "type": "osd"},
>>                { "op": "emit"}]}],
>> 
>> 
>> Inspecting the pg dump I observed that all pgs had a single osd in the
>> up/acting sets. That seemed to explain why the pgs were degraded, but it was
>> unclear to me why a second OSD wasn't in the set. After trying a variety of
>> things, I noticed that there was a difference between Emperor (which works
>> fine in these configurations) and Firefly with the default tunables, where
>> Firefly comes up with the bobtail profile. The setting
>> choose_local_fallback_tries is 0 in this profile while it used to default to
>> 5 on Emperor. Sure enough, if I modify my crush map and set the parameter to
>> a non-zero value, the cluster remaps and goes healthy with all pgs
>> active+clean.
>> 
>> The documentation states the optimal value of choose_local_fallback_tries is
>> 0 for FF, so I'd like to get a better understanding of this parameter and
>> why modifying the default value moves the pgs to a clean state in my
>> scenarios.
>> 
>> Thanks,
>> Ripal
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140825/a0be0d26/attachment.htm>