Re: Bluestore: v11.2.0 peering not happening when OSD is down

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 30 Jan 2017 13:24:16 -0800

You might also check out "ceph osd tree" and crush dump and make sure
they look the way you expect.

On Mon, Jan 30, 2017 at 1:23 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> On Sun, Jan 29, 2017 at 6:40 AM, Muthusamy Muthiah
> <muthiah.muthusamy@xxxxxxxxx> wrote:
>> Hi All,
>>
>> Also tried EC profile 3+1 on 5 node cluster with bluestore enabled  . When
>> an OSD is down the cluster goes to ERROR state even when the cluster is n+1
>> . No recovery happening.
>>
>> health HEALTH_ERR
>>             75 pgs are stuck inactive for more than 300 seconds
>>             75 pgs incomplete
>>             75 pgs stuck inactive
>>             75 pgs stuck unclean
>>      monmap e2: 5 mons at
>> {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3=10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0}
>>             election epoch 10, quorum 0,1,2,3,4
>> ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5
>>         mgr active: ca-cn1 standbys: ca-cn4, ca-cn3, ca-cn5, ca-cn2
>>      osdmap e264: 60 osds: 59 up, 59 in; 75 remapped pgs
>>             flags sortbitwise,require_jewel_osds,require_kraken_osds
>>       pgmap v119402: 1024 pgs, 1 pools, 28519 GB data, 21548 kobjects
>>             39976 GB used, 282 TB / 322 TB avail
>>                  941 active+clean
>>                   75 remapped+incomplete
>>                    8 active+clean+scrubbing
>>
>> this seems to be an issue with bluestore , recovery not happening properly
>> with EC .
>
> It's possible but it seems a lot more likely this is some kind of
> config issue. Can you share your osd map ("ceph osd getmap")?
> -Greg
>
>>
>> Thanks,
>> Muthu
>>
>> On 24 January 2017 at 12:57, Muthusamy Muthiah <muthiah.muthusamy@xxxxxxxxx>
>> wrote:
>>>
>>> Hi Greg,
>>>
>>> We use EC:4+1 on 5 node cluster in production deployments with filestore
>>> and it does recovery and peering when one OSD goes down. After few mins ,
>>> other OSD from a node where the fault OSD exists will take over the PGs
>>> temporarily and all PGs goes to active + clean state . Cluster also does not
>>> goes down during this recovery process.
>>>
>>> Only on bluestore we see cluster going to error state when one OSD is
>>> down.
>>> We are still validating this and let you know additional findings.
>>>
>>> Thanks,
>>> Muthu
>>>
>>> On 21 January 2017 at 02:06, Shinobu Kinjo <skinjo@xxxxxxxxxx> wrote:
>>>>
>>>> `ceph pg dump` should show you something like:
>>>>
>>>>  * active+undersized+degraded ... [NONE,3,2,4,1]    3    [NONE,3,2,4,1]
>>>>
>>>> Sam,
>>>>
>>>> Am I wrong? Or is it up to something else?
>>>>
>>>>
>>>> On Sat, Jan 21, 2017 at 4:22 AM, Gregory Farnum <gfarnum@xxxxxxxxxx>
>>>> wrote:
>>>> > I'm pretty sure the default configs won't let an EC PG go active with
>>>> > only "k" OSDs in its PG; it needs at least k+1 (or possibly more? Not
>>>> > certain). Running an "n+1" EC config is just not a good idea.
>>>> > For testing you could probably adjust this with the equivalent of
>>>> > min_size for EC pools, but I don't know the parameters off the top of
>>>> > my head.
>>>> > -Greg
>>>> >
>>>> > On Fri, Jan 20, 2017 at 2:15 AM, Muthusamy Muthiah
>>>> > <muthiah.muthusamy@xxxxxxxxx> wrote:
>>>> >> Hi ,
>>>> >>
>>>> >> We are validating kraken 11.2.0 with bluestore  on 5 node cluster with
>>>> >> EC
>>>> >> 4+1.
>>>> >>
>>>> >> When an OSD is down , the peering is not happening and ceph health
>>>> >> status
>>>> >> moved to ERR state after few mins. This was working in previous
>>>> >> development
>>>> >> releases. Any additional configuration required in v11.2.0
>>>> >>
>>>> >> Following is our ceph configuration:
>>>> >>
>>>> >> mon_osd_down_out_interval = 30
>>>> >> mon_osd_report_timeout = 30
>>>> >> mon_osd_down_out_subtree_limit = host
>>>> >> mon_osd_reporter_subtree_level = host
>>>> >>
>>>> >> and the recovery parameters set to default.
>>>> >>
>>>> >> [root@ca-cn1 ceph]# ceph osd crush show-tunables
>>>> >>
>>>> >> {
>>>> >>     "choose_local_tries": 0,
>>>> >>     "choose_local_fallback_tries": 0,
>>>> >>     "choose_total_tries": 50,
>>>> >>     "chooseleaf_descend_once": 1,
>>>> >>     "chooseleaf_vary_r": 1,
>>>> >>     "chooseleaf_stable": 1,
>>>> >>     "straw_calc_version": 1,
>>>> >>     "allowed_bucket_algs": 54,
>>>> >>     "profile": "jewel",
>>>> >>     "optimal_tunables": 1,
>>>> >>     "legacy_tunables": 0,
>>>> >>     "minimum_required_version": "jewel",
>>>> >>     "require_feature_tunables": 1,
>>>> >>     "require_feature_tunables2": 1,
>>>> >>     "has_v2_rules": 1,
>>>> >>     "require_feature_tunables3": 1,
>>>> >>     "has_v3_rules": 0,
>>>> >>     "has_v4_buckets": 0,
>>>> >>     "require_feature_tunables5": 1,
>>>> >>     "has_v5_rules": 0
>>>> >> }
>>>> >>
>>>> >> ceph status:
>>>> >>
>>>> >>      health HEALTH_ERR
>>>> >>             173 pgs are stuck inactive for more than 300 seconds
>>>> >>             173 pgs incomplete
>>>> >>             173 pgs stuck inactive
>>>> >>             173 pgs stuck unclean
>>>> >>      monmap e2: 5 mons at
>>>> >>
>>>> >> {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3=10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0}
>>>> >>             election epoch 106, quorum 0,1,2,3,4
>>>> >> ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5
>>>> >>         mgr active: ca-cn1 standbys: ca-cn2, ca-cn4, ca-cn5, ca-cn3
>>>> >>      osdmap e1128: 60 osds: 59 up, 59 in; 173 remapped pgs
>>>> >>             flags sortbitwise,require_jewel_osds,require_kraken_osds
>>>> >>       pgmap v782747: 2048 pgs, 1 pools, 63133 GB data, 46293 kobjects
>>>> >>             85199 GB used, 238 TB / 322 TB avail
>>>> >>                 1868 active+clean
>>>> >>                  173 remapped+incomplete
>>>> >>                    7 active+clean+scrubbing
>>>> >>
>>>> >> MON log:
>>>> >>
>>>> >> 2017-01-20 09:25:54.715684 7f55bcafb700  0 log_channel(cluster) log
>>>> >> [INF] :
>>>> >> osd.54 out (down for 31.703786)
>>>> >> 2017-01-20 09:25:54.725688 7f55bf4d5700  0 mon.ca-cn1@0(leader).osd
>>>> >> e1120
>>>> >> crush map has features 288250512065953792, adjusting msgr requires
>>>> >> 2017-01-20 09:25:54.729019 7f55bf4d5700  0 log_channel(cluster) log
>>>> >> [INF] :
>>>> >> osdmap e1120: 60 osds: 59 up, 59 in
>>>> >> 2017-01-20 09:25:54.735987 7f55bf4d5700  0 log_channel(cluster) log
>>>> >> [INF] :
>>>> >> pgmap v781993: 2048 pgs: 1869 active+clean, 173 incomplete, 6
>>>> >> active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB / 322 TB
>>>> >> avail;
>>>> >> 21825 B/s rd, 163 MB/s wr, 2046 op/s
>>>> >> 2017-01-20 09:25:55.737749 7f55bf4d5700  0 mon.ca-cn1@0(leader).osd
>>>> >> e1121
>>>> >> crush map has features 288250512065953792, adjusting msgr requires
>>>> >> 2017-01-20 09:25:55.744338 7f55bf4d5700  0 log_channel(cluster) log
>>>> >> [INF] :
>>>> >> osdmap e1121: 60 osds: 59 up, 59 in
>>>> >> 2017-01-20 09:25:55.749616 7f55bf4d5700  0 log_channel(cluster) log
>>>> >> [INF] :
>>>> >> pgmap v781994: 2048 pgs: 29 remapped+incomplete, 1869 active+clean,
>>>> >> 144
>>>> >> incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB used,
>>>> >> 238 TB /
>>>> >> 322 TB avail; 44503 B/s rd, 45681 kB/s wr, 518 op/s
>>>> >> 2017-01-20 09:25:56.768721 7f55bf4d5700  0 log_channel(cluster) log
>>>> >> [INF] :
>>>> >> pgmap v781995: 2048 pgs: 47 remapped+incomplete, 1869 active+clean,
>>>> >> 126
>>>> >> incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB used,
>>>> >> 238 TB /
>>>> >> 322 TB avail; 20275 B/s rd, 72742 kB/s wr, 665 op/s
>>>> >>
>>>> >> Thanks,
>>>> >> Muthu
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> ceph-users mailing list
>>>> >> ceph-users@xxxxxxxxxxxxxx
>>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >>
>>>> > _______________________________________________
>>>> > ceph-users mailing list
>>>> > ceph-users@xxxxxxxxxxxxxx
>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com