Hi ,
We are validating kraken 11.2.0 with bluestore on 5 node cluster with EC 4+1.
When an OSD is down , the peering is not happening and ceph health status moved to ERR state after few mins. This was working in previous development releases. Any additional configuration required in v11.2.0
Following is our ceph configuration:
mon_osd_down_out_interval = 30
mon_osd_report_timeout = 30
mon_osd_down_out_subtree_limit = host
mon_osd_reporter_subtree_level = host
and the recovery parameters set to default.
[root@ca-cn1 ceph]# ceph osd crush show-tunables
{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 1,
"straw_calc_version": 1,
"allowed_bucket_algs": 54,
"profile": "jewel",
"optimal_tunables": 1,
"legacy_tunables": 0,
"minimum_required_version": "jewel",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 1,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 0,
"require_feature_tunables5": 1,
"has_v5_rules": 0
}
ceph status:
health HEALTH_ERR
173 pgs are stuck inactive for more than 300 seconds
173 pgs incomplete
173 pgs stuck inactive
173 pgs stuck unclean
monmap e2: 5 mons at {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3=10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0}
election epoch 106, quorum 0,1,2,3,4 ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5
mgr active: ca-cn1 standbys: ca-cn2, ca-cn4, ca-cn5, ca-cn3
osdmap e1128: 60 osds: 59 up, 59 in; 173 remapped pgs
flags sortbitwise,require_jewel_osds,require_kraken_osds
pgmap v782747: 2048 pgs, 1 pools, 63133 GB data, 46293 kobjects
85199 GB used, 238 TB / 322 TB avail
1868 active+clean
173 remapped+incomplete
7 active+clean+scrubbing
MON log:
2017-01-20 09:25:54.715684 7f55bcafb700 0 log_channel(cluster) log [INF] : osd.54 out (down for 31.703786)
2017-01-20 09:25:54.725688 7f55bf4d5700 0 mon.ca-cn1@0(leader).osd e1120 crush map has features 288250512065953792, adjusting msgr requires
2017-01-20 09:25:54.729019 7f55bf4d5700 0 log_channel(cluster) log [INF] : osdmap e1120: 60 osds: 59 up, 59 in
2017-01-20 09:25:54.735987 7f55bf4d5700 0 log_channel(cluster) log [INF] : pgmap v781993: 2048 pgs: 1869 active+clean, 173 incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB / 322 TB avail; 21825 B/s rd, 163 MB/s wr, 2046 op/s
2017-01-20 09:25:55.737749 7f55bf4d5700 0 mon.ca-cn1@0(leader).osd e1121 crush map has features 288250512065953792, adjusting msgr requires
2017-01-20 09:25:55.744338 7f55bf4d5700 0 log_channel(cluster) log [INF] : osdmap e1121: 60 osds: 59 up, 59 in
2017-01-20 09:25:55.749616 7f55bf4d5700 0 log_channel(cluster) log [INF] : pgmap v781994: 2048 pgs: 29 remapped+incomplete, 1869 active+clean, 144 incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB / 322 TB avail; 44503 B/s rd, 45681 kB/s wr, 518 op/s
2017-01-20 09:25:56.768721 7f55bf4d5700 0 log_channel(cluster) log [INF] : pgmap v781995: 2048 pgs: 47 remapped+incomplete, 1869 active+clean, 126 incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB / 322 TB avail; 20275 B/s rd, 72742 kB/s wr, 665 op/s
Thanks,
Muthu
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com