Hello:
According to my understanding, osd's heartbeat partners only come from those osds who assume the same pg
See below(# ceph osd tree), osd.10 and osd.0-6 cannot assume the same pg, because osd.10 and osd.0-6 are from different root tree, and pg in my cluster doesn't map across root trees(# ceph osd crush rule dump). so, osd.0-6 cannot become the heartbeat partner of osd.10
But, below is the log on osd.10, It can be seen that the osd.10's heartbeat partner include osd.0/1/2/5, why?
thanks for any help
# osd.10 log
2019-11-20 09:21:50.431799 7fbb369fb700 -1 osd.10 7344 heartbeat_check: no reply from 10.13.6.162:6806 osd.2 since back 2019-11-20 09:21:19.979712 front 2019-11-20 09:21:19.979712 (cutoff 2019-11-20 09:21:30.431768)
2019-11-20 13:15:59.175060 7fbb369fb700 -1 osd.10 7357 heartbeat_check: no reply from 10.13.6.162:6806 osd.2 since back 2019-11-20 13:15:38.710424 front 2019-11-20 13:15:38.710424 (cutoff 2019-11-20 13:15:39.175058)
2019-11-20 13:15:59.175110 7fbb369fb700 -1 osd.10 7357 heartbeat_check: no reply from 10.13.6.160:6803 osd.0 since back 2019-11-20 13:15:38.710424 front 2019-11-20 13:15:38.710424 (cutoff 2019-11-20 13:15:39.175058)
2019-11-20 13:15:59.175118 7fbb369fb700 -1 osd.10 7357 heartbeat_check: no reply from 10.13.6.161:6803 osd.1 since back 2019-11-20 13:15:38.710424 front 2019-11-20 13:15:38.710424 (cutoff 2019-11-20 13:15:39.175058)
2019-11-21 02:52:24.656783 7fbb369fb700 -1 osd.10 7374 heartbeat_check: no reply from 10.13.6.158:6810 osd.5 since back 2019-11-21 02:52:04.557548 front 2019-11-21 02:52:04.557548 (cutoff 2019-11-21 02:52:04.656781)
# ceph osd tree
-17 3.29095 root ssd-storage
-25 1.09698 rack rack-ssd-A
-18 1.09698 host ssd-osd01
10 hdd 1.09698 osd.10 up 1.00000 1.00000
-26 1.09698 rack rack-ssd-B
-19 1.09698 host ssd-osd02
11 hdd 1.09698 osd.11 up 1.00000 1.00000
-27 1.09698 rack rack-ssd-C
-20 1.09698 host ssd-osd03
12 hdd 1.09698 osd.12 up 1.00000 1.00000
-1 3.22256 root default
-3 0.29300 host test-osd01
0 hdd 0.29300 osd.0 up 1.00000 1.00000
-5 0.29300 host test-osd02
1 hdd 0.29300 osd.1 up 0.89999 1.00000
-7 0.29300 host test-osd03
2 hdd 0.29300 osd.2 up 0.79999 1.00000
-9 0.29300 host test-osd04
3 hdd 0.29300 osd.3 up 1.00000 1.00000
-11 0.29300 host test-osd05
4 hdd 0.29300 osd.4 up 1.00000 1.00000
-13 0.29300 host test-osd06
5 hdd 0.29300 osd.5 up 1.00000 1.00000
-15 0.29300 host test-osd07
6 hdd 0.29300 osd.6 up 1.00000 1.00000
# ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
},
{
"rule_id": 1,
"rule_name": "replicated_rule_ssd",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -17,
"item_name": "ssd-storage"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "rack"
},
{
"op": "emit"
}
]
}
]
# some parameters
"mon_osd_down_out_interval": "600",
"mon_osd_down_out_subtree_limit": "rack",
"mds_debug_subtrees": "false",
"mon_osd_down_out_subtree_limit": "rack",
"mon_osd_reporter_subtree_level": "host",
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx