Lots of "wrongly marked me down" messages

Василий Ангапов <angapov@xxxxxxxxx> · Mon, 12 Sep 2016 17:14:26 +0300

Hello, colleagues!

I have Ceph Jewel cluster of 10 nodes (Centos 7 kernel 4.7.0),  290
OSDs total with journals on SSDs. Network is 2x10Gb public and 2x10GB
cluster.
I do constantly see periodic slow requests being followed by "wrongly
marked me down" record in ceph.log like this:

root@ed-ds-c171:[~]:$ grep "marked me down" /var/log/ceph/ceph.log | tail -n20
2016-09-12 12:26:58.818453 osd.167 10.144.66.176:6844/71769 1698 :
cluster [WRN] map e82752 wrongly marked me down
2016-09-12 12:26:59.394144 osd.26 10.144.66.172:6866/6702 797 :
cluster [WRN] map e82752 wrongly marked me down
2016-09-12 12:27:07.319486 osd.104 10.144.66.178:6810/24704 1903 :
cluster [WRN] map e82759 wrongly marked me down
2016-09-12 12:27:08.573852 osd.213 10.144.66.180:6844/75655 1780 :
cluster [WRN] map e82759 wrongly marked me down
2016-09-12 12:27:06.792145 osd.111 10.144.66.179:6808/21311 1071 :
cluster [WRN] map e82758 wrongly marked me down
2016-09-12 12:27:07.228637 osd.188 10.144.66.174:6832/47910 2806 :
cluster [WRN] map e82759 wrongly marked me down
2016-09-12 12:27:11.904581 osd.55 10.144.66.172:6852/6485 645 :
cluster [WRN] map e82762 wrongly marked me down
2016-09-12 12:27:08.513199 osd.76 10.144.66.175:6824/6074 648 :
cluster [WRN] map e82759 wrongly marked me down
2016-09-12 12:27:10.250008 osd.146 10.144.66.180:6802/8353 1739 :
cluster [WRN] map e82761 wrongly marked me down
2016-09-12 12:27:35.815834 osd.141 10.144.66.174:6834/49042 3331 :
cluster [WRN] map e82785 wrongly marked me down
2016-09-12 12:28:32.344378 osd.137 10.144.66.180:6812/27980 1572 :
cluster [WRN] map e82795 wrongly marked me down
2016-09-12 13:13:20.891681 osd.102 10.144.66.174:6806/18929 2159 :
cluster [WRN] map e82808 wrongly marked me down
2016-09-12 13:13:22.007868 osd.205 10.144.66.180:6846/76323 2034 :
cluster [WRN] map e82810 wrongly marked me down
2016-09-12 13:13:22.776924 osd.77 10.144.66.176:6810/24750 1933 :
cluster [WRN] map e82810 wrongly marked me down
2016-09-12 13:23:11.695542 osd.197 10.144.66.180:6828/58341 1931 :
cluster [WRN] map e82824 wrongly marked me down
2016-09-12 13:27:21.894787 osd.169 10.144.66.175:6808/5958 321 :
cluster [WRN] map e82840 wrongly marked me down
2016-09-12 13:27:40.011952 osd.142 10.144.66.178:6857/133781 2109 :
cluster [WRN] map e82850 wrongly marked me down
2016-09-12 13:56:28.290493 osd.26 10.144.66.172:6866/6702 810 :
cluster [WRN] map e82862 wrongly marked me down
2016-09-12 13:58:09.993764 osd.225 10.144.66.176:6804/14859 2502 :
cluster [WRN] map e82876 wrongly marked me down
2016-09-12 13:58:51.077331 osd.28 10.144.66.171:6860/7240 2049 :
cluster [WRN] map e82888 wrongly marked me down

root@ed-ds-c171:[~]:$ for osd in `grep "marked me down"
/var/log/ceph/ceph.log | awk '{print $3}' | cut -b 5-`; do ceph osd
find $osd | grep host ; done | sort | uniq -c
      4         "host": "ed-ds-c171",
     14         "host": "ed-ds-c172",
     12         "host": "ed-ds-c173",
     14         "host": "ed-ds-c174",
     16         "host": "ed-ds-c175",
     16         "host": "ed-ds-c176",
     10         "host": "ed-ds-c177",
     16         "host": "ed-ds-c178",
     13         "host": "ed-ds-c179",
     21         "host": "ed-ds-c180",

See that those OSDs are almost evenly distributed across 10 nodes. Our
network guys say that everything is OK on switches, meaning no errors
in logs and no errors on interfaces.
My feeling is that something is definitely wrong with network, but I
cannot find direct evidence for that. How can I debug those issues?

In OSD logs I see the following messages right prior to "wrongly
marked me down":

2016-09-12 07:38:08.933444 7fbbe695e700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15
2016-09-12 07:38:08.939339 7fbbe515b700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15
2016-09-12 07:38:08.939345 7fbbe695e700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15
2016-09-12 07:38:08.955960 7fbbe515b700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15
2016-09-12 07:38:08.955973 7fbbe695e700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15
2016-09-12 07:38:08.973254 7fbc38c34700 -1 osd.16 82013
heartbeat_check: no reply from osd.77 since back 2016-09-12
07:37:45.69687
0 front 2016-09-12 07:37:45.696870 (cutoff 2016-09-12 07:37:48.973243)
2016-09-12 07:38:08.973274 7fbc38c34700 -1 osd.16 82013
heartbeat_check: no reply from osd.137 since back 2016-09-12
07:37:26.0550
57 front 2016-09-12 07:37:26.055057 (cutoff 2016-09-12 07:37:48.973243)
2016-09-12 07:38:08.973280 7fbc38c34700 -1 osd.16 82013
heartbeat_check: no reply from osd.155 since back 2016-09-12
07:37:45.6968
70 front 2016-09-12 07:37:45.696870 (cutoff 2016-09-12 07:37:48.973243)
2016-09-12 07:38:08.973286 7fbc38c34700 -1 osd.16 82013
heartbeat_check: no reply from osd.170 since back 2016-09-12
07:37:45.6968
70 front 2016-09-12 07:37:45.696870 (cutoff 2016-09-12 07:37:48.973243)

My ceph.conf:

[global]
fsid = 5ddb8aab-49b4-4a63-918e-33c569e3101e
mon initial members = ed-ds-c171, ed-ds-c172, ed-ds-c173
mon host = 10.144.66.171, 10.144.66.172, 10.144.66.173
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
public network = 10.144.66.0/24
cluster network = 10.144.126.0/24
osd pool default size = 3
osd pool default min size = 1
osd max backfills = 1
mon pg warn max per osd = 1000
mon pg warn max object skew = 1000
mon lease = 50
mon lease renew interval = 30
mon lease ack timeout = 100
rbd default features = 3
osd disk thread ioprio priority = 7
osd disk thread ioprio class = idle
osd crush update on start = false
mon osd down out interval = 900
osd recovery max active = 1
osd op threads = 8
mon osd min down reporters = 5

My sysctl,conf:
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_max_orphans = 131072
net.core.somaxconn = 16384
net.core.netdev_max_backlog = 16384
net.ipv4.tcp_max_syn_backlog = 32768
net.ipv4.tcp_max_tw_buckets = 524288
kernel.panic = 180
net.netfilter.nf_conntrack_max = 262144
net.core.rmem_max = 56623104
net.core.wmem_max = 56623104
net.core.rmem_default = 56623104
net.core.wmem_default = 56623104
net.core.optmem_max = 40960
net.ipv4.tcp_rmem = 4096 87380 56623104
net.ipv4.tcp_wmem = 4096 65536 56623104
net.core.netdev_max_backlog = 50000
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
fs.nr_open = 13109720
fs.file-max = 13109720
kernel.pid_max = 4194304
vm.vfs_cache_pressure=400
vm.min_free_kbytes=2097152
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1

Thanks a lot for any help!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com