Bluestore: v11.2.0 peering not happening when OSD is down

Muthusamy Muthiah <muthiah.muthusamy@xxxxxxxxx> · Fri, 20 Jan 2017 15:45:10 +0530

Hi ,
We are validating kraken 11.2.0 with bluestore  on 5 node cluster with EC 4+1.

When an OSD is down , the peering is not happening and ceph health status moved to ERR state after few mins. This was working in previous development releases. Any additional configuration required in v11.2.0

Following is our ceph configuration:

mon_osd_down_out_interval = 30    
mon_osd_report_timeout = 30
mon_osd_down_out_subtree_limit = host
mon_osd_reporter_subtree_level = host

and the recovery parameters set to default.

[root@ca-cn1 ceph]# ceph osd crush show-tunables

{
    "choose_local_tries": 0,
    "choose_local_fallback_tries": 0,
    "choose_total_tries": 50,
    "chooseleaf_descend_once": 1,
    "chooseleaf_vary_r": 1,
    "chooseleaf_stable": 1,
    "straw_calc_version": 1,
    "allowed_bucket_algs": 54,
    "profile": "jewel",
    "optimal_tunables": 1,
    "legacy_tunables": 0,
    "minimum_required_version": "jewel",
    "require_feature_tunables": 1,
    "require_feature_tunables2": 1,
    "has_v2_rules": 1,
    "require_feature_tunables3": 1,
    "has_v3_rules": 0,
    "has_v4_buckets": 0,
    "require_feature_tunables5": 1,
    "has_v5_rules": 0
}

ceph status:

     health HEALTH_ERR
            173 pgs are stuck inactive for more than 300 seconds
            173 pgs incomplete
            173 pgs stuck inactive
            173 pgs stuck unclean
     monmap e2: 5 mons at {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3=10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0}
            election epoch 106, quorum 0,1,2,3,4 ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5
        mgr active: ca-cn1 standbys: ca-cn2, ca-cn4, ca-cn5, ca-cn3
     osdmap e1128: 60 osds: 59 up, 59 in; 173 remapped pgs
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v782747: 2048 pgs, 1 pools, 63133 GB data, 46293 kobjects
            85199 GB used, 238 TB / 322 TB avail
                1868 active+clean
                 173 remapped+incomplete
                   7 active+clean+scrubbing

MON log:

2017-01-20 09:25:54.715684 7f55bcafb700  0 log_channel(cluster) log [INF] : osd.54 out (down for 31.703786)
2017-01-20 09:25:54.725688 7f55bf4d5700  0 mon.ca-cn1@0(leader).osd e1120 crush map has features 288250512065953792, adjusting msgr requires
2017-01-20 09:25:54.729019 7f55bf4d5700  0 log_channel(cluster) log [INF] : osdmap e1120: 60 osds: 59 up, 59 in
2017-01-20 09:25:54.735987 7f55bf4d5700  0 log_channel(cluster) log [INF] : pgmap v781993: 2048 pgs: 1869 active+clean, 173 incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB / 322 TB avail; 21825 B/s rd, 163 MB/s wr, 2046 op/s
2017-01-20 09:25:55.737749 7f55bf4d5700  0 mon.ca-cn1@0(leader).osd e1121 crush map has features 288250512065953792, adjusting msgr requires
2017-01-20 09:25:55.744338 7f55bf4d5700  0 log_channel(cluster) log [INF] : osdmap e1121: 60 osds: 59 up, 59 in
2017-01-20 09:25:55.749616 7f55bf4d5700  0 log_channel(cluster) log [INF] : pgmap v781994: 2048 pgs: 29 remapped+incomplete, 1869 active+clean, 144 incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB / 322 TB avail; 44503 B/s rd, 45681 kB/s wr, 518 op/s
2017-01-20 09:25:56.768721 7f55bf4d5700  0 log_channel(cluster) log [INF] : pgmap v781995: 2048 pgs: 47 remapped+incomplete, 1869 active+clean, 126 incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB / 322 TB avail; 20275 B/s rd, 72742 kB/s wr, 665 op/s

Thanks,
Muthu

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com