Re: Lots of "wrongly marked me down" messages

Wido den Hollander <wido@xxxxxxxx> · Mon, 12 Sep 2016 22:06:54 +0200 (CEST)

> Op 12 september 2016 om 16:14 schreef Василий Ангапов <angapov@xxxxxxxxx>:
> 
> 
> Hello, colleagues!
> 
> I have Ceph Jewel cluster of 10 nodes (Centos 7 kernel 4.7.0),  290
> OSDs total with journals on SSDs. Network is 2x10Gb public and 2x10GB
> cluster.
> I do constantly see periodic slow requests being followed by "wrongly
> marked me down" record in ceph.log like this:
> 
> root@ed-ds-c171:[~]:$ grep "marked me down" /var/log/ceph/ceph.log | tail -n20
> 2016-09-12 12:26:58.818453 osd.167 10.144.66.176:6844/71769 1698 :
> cluster [WRN] map e82752 wrongly marked me down
> 2016-09-12 12:26:59.394144 osd.26 10.144.66.172:6866/6702 797 :
> cluster [WRN] map e82752 wrongly marked me down
> 2016-09-12 12:27:07.319486 osd.104 10.144.66.178:6810/24704 1903 :
> cluster [WRN] map e82759 wrongly marked me down
> 2016-09-12 12:27:08.573852 osd.213 10.144.66.180:6844/75655 1780 :
> cluster [WRN] map e82759 wrongly marked me down
> 2016-09-12 12:27:06.792145 osd.111 10.144.66.179:6808/21311 1071 :
> cluster [WRN] map e82758 wrongly marked me down
> 2016-09-12 12:27:07.228637 osd.188 10.144.66.174:6832/47910 2806 :
> cluster [WRN] map e82759 wrongly marked me down
> 2016-09-12 12:27:11.904581 osd.55 10.144.66.172:6852/6485 645 :
> cluster [WRN] map e82762 wrongly marked me down
> 2016-09-12 12:27:08.513199 osd.76 10.144.66.175:6824/6074 648 :
> cluster [WRN] map e82759 wrongly marked me down
> 2016-09-12 12:27:10.250008 osd.146 10.144.66.180:6802/8353 1739 :
> cluster [WRN] map e82761 wrongly marked me down
> 2016-09-12 12:27:35.815834 osd.141 10.144.66.174:6834/49042 3331 :
> cluster [WRN] map e82785 wrongly marked me down
> 2016-09-12 12:28:32.344378 osd.137 10.144.66.180:6812/27980 1572 :
> cluster [WRN] map e82795 wrongly marked me down
> 2016-09-12 13:13:20.891681 osd.102 10.144.66.174:6806/18929 2159 :
> cluster [WRN] map e82808 wrongly marked me down
> 2016-09-12 13:13:22.007868 osd.205 10.144.66.180:6846/76323 2034 :
> cluster [WRN] map e82810 wrongly marked me down
> 2016-09-12 13:13:22.776924 osd.77 10.144.66.176:6810/24750 1933 :
> cluster [WRN] map e82810 wrongly marked me down
> 2016-09-12 13:23:11.695542 osd.197 10.144.66.180:6828/58341 1931 :
> cluster [WRN] map e82824 wrongly marked me down
> 2016-09-12 13:27:21.894787 osd.169 10.144.66.175:6808/5958 321 :
> cluster [WRN] map e82840 wrongly marked me down
> 2016-09-12 13:27:40.011952 osd.142 10.144.66.178:6857/133781 2109 :
> cluster [WRN] map e82850 wrongly marked me down
> 2016-09-12 13:56:28.290493 osd.26 10.144.66.172:6866/6702 810 :
> cluster [WRN] map e82862 wrongly marked me down
> 2016-09-12 13:58:09.993764 osd.225 10.144.66.176:6804/14859 2502 :
> cluster [WRN] map e82876 wrongly marked me down
> 2016-09-12 13:58:51.077331 osd.28 10.144.66.171:6860/7240 2049 :
> cluster [WRN] map e82888 wrongly marked me down
> 
> root@ed-ds-c171:[~]:$ for osd in `grep "marked me down"
> /var/log/ceph/ceph.log | awk '{print $3}' | cut -b 5-`; do ceph osd
> find $osd | grep host ; done | sort | uniq -c
>       4         "host": "ed-ds-c171",
>      14         "host": "ed-ds-c172",
>      12         "host": "ed-ds-c173",
>      14         "host": "ed-ds-c174",
>      16         "host": "ed-ds-c175",
>      16         "host": "ed-ds-c176",
>      10         "host": "ed-ds-c177",
>      16         "host": "ed-ds-c178",
>      13         "host": "ed-ds-c179",
>      21         "host": "ed-ds-c180",
> 
> See that those OSDs are almost evenly distributed across 10 nodes. Our
> network guys say that everything is OK on switches, meaning no errors
> in logs and no errors on interfaces.
> My feeling is that something is definitely wrong with network, but I
> cannot find direct evidence for that. How can I debug those issues?
> 
> In OSD logs I see the following messages right prior to "wrongly
> marked me down":
> 
> 2016-09-12 07:38:08.933444 7fbbe695e700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15
> 2016-09-12 07:38:08.939339 7fbbe515b700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15
> 2016-09-12 07:38:08.939345 7fbbe695e700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15
> 2016-09-12 07:38:08.955960 7fbbe515b700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15
> 2016-09-12 07:38:08.955973 7fbbe695e700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15
> 2016-09-12 07:38:08.973254 7fbc38c34700 -1 osd.16 82013
> heartbeat_check: no reply from osd.77 since back 2016-09-12
> 07:37:45.69687
> 0 front 2016-09-12 07:37:45.696870 (cutoff 2016-09-12 07:37:48.973243)
> 2016-09-12 07:38:08.973274 7fbc38c34700 -1 osd.16 82013
> heartbeat_check: no reply from osd.137 since back 2016-09-12
> 07:37:26.0550
> 57 front 2016-09-12 07:37:26.055057 (cutoff 2016-09-12 07:37:48.973243)
> 2016-09-12 07:38:08.973280 7fbc38c34700 -1 osd.16 82013
> heartbeat_check: no reply from osd.155 since back 2016-09-12
> 07:37:45.6968
> 70 front 2016-09-12 07:37:45.696870 (cutoff 2016-09-12 07:37:48.973243)
> 2016-09-12 07:38:08.973286 7fbc38c34700 -1 osd.16 82013
> heartbeat_check: no reply from osd.170 since back 2016-09-12
> 07:37:45.6968
> 70 front 2016-09-12 07:37:45.696870 (cutoff 2016-09-12 07:37:48.973243)
> 
> My ceph.conf:
> 
> [global]
> fsid = 5ddb8aab-49b4-4a63-918e-33c569e3101e
> mon initial members = ed-ds-c171, ed-ds-c172, ed-ds-c173
> mon host = 10.144.66.171, 10.144.66.172, 10.144.66.173
> auth cluster required = cephx
> auth service required = cephx
> auth client required = cephx
> public network = 10.144.66.0/24
> cluster network = 10.144.126.0/24
> osd pool default size = 3
> osd pool default min size = 1
> osd max backfills = 1
> mon pg warn max per osd = 1000
> mon pg warn max object skew = 1000
> mon lease = 50
> mon lease renew interval = 30
> mon lease ack timeout = 100
> rbd default features = 3
> osd disk thread ioprio priority = 7
> osd disk thread ioprio class = idle
> osd crush update on start = false
> mon osd down out interval = 900
> osd recovery max active = 1
> osd op threads = 8
> mon osd min down reporters = 5
> 
> My sysctl,conf:
> kernel.msgmnb = 65536
> kernel.msgmax = 65536
> kernel.shmmax = 68719476736
> kernel.shmall = 4294967296
> net.ipv4.ip_local_port_range = 1024 65535
> net.ipv4.tcp_fin_timeout = 15
> net.ipv4.tcp_tw_reuse=1
> net.ipv4.tcp_max_orphans = 131072
> net.core.somaxconn = 16384
> net.core.netdev_max_backlog = 16384
> net.ipv4.tcp_max_syn_backlog = 32768
> net.ipv4.tcp_max_tw_buckets = 524288
> kernel.panic = 180
> net.netfilter.nf_conntrack_max = 262144
> net.core.rmem_max = 56623104
> net.core.wmem_max = 56623104
> net.core.rmem_default = 56623104
> net.core.wmem_default = 56623104
> net.core.optmem_max = 40960
> net.ipv4.tcp_rmem = 4096 87380 56623104
> net.ipv4.tcp_wmem = 4096 65536 56623104
> net.core.netdev_max_backlog = 50000
> net.ipv4.tcp_max_tw_buckets = 2000000
> net.ipv4.tcp_tw_recycle = 0
> net.ipv4.tcp_tw_reuse = 0
> net.ipv4.tcp_fin_timeout = 10
> net.ipv4.tcp_slow_start_after_idle = 0
> net.ipv4.conf.all.send_redirects = 0
> net.ipv4.conf.all.accept_redirects = 0
> net.ipv4.conf.all.accept_source_route = 0
> fs.nr_open = 13109720
> fs.file-max = 13109720
> kernel.pid_max = 4194304
> vm.vfs_cache_pressure=400
> vm.min_free_kbytes=2097152
> net.ipv6.conf.all.disable_ipv6 = 1
> net.ipv6.conf.default.disable_ipv6 = 1
> 

Can you try by at least reverting all the TCP settings? I would try that first since it seems 'tuning because we can' to me.

I've seen various issues with TCP settings in regard to Ceph.

Wido

> Thanks a lot for any help!
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com