> Op 12 september 2016 om 16:14 schreef Василий Ангапов <angapov@xxxxxxxxx>: > > > Hello, colleagues! > > I have Ceph Jewel cluster of 10 nodes (Centos 7 kernel 4.7.0), 290 > OSDs total with journals on SSDs. Network is 2x10Gb public and 2x10GB > cluster. > I do constantly see periodic slow requests being followed by "wrongly > marked me down" record in ceph.log like this: > > root@ed-ds-c171:[~]:$ grep "marked me down" /var/log/ceph/ceph.log | tail -n20 > 2016-09-12 12:26:58.818453 osd.167 10.144.66.176:6844/71769 1698 : > cluster [WRN] map e82752 wrongly marked me down > 2016-09-12 12:26:59.394144 osd.26 10.144.66.172:6866/6702 797 : > cluster [WRN] map e82752 wrongly marked me down > 2016-09-12 12:27:07.319486 osd.104 10.144.66.178:6810/24704 1903 : > cluster [WRN] map e82759 wrongly marked me down > 2016-09-12 12:27:08.573852 osd.213 10.144.66.180:6844/75655 1780 : > cluster [WRN] map e82759 wrongly marked me down > 2016-09-12 12:27:06.792145 osd.111 10.144.66.179:6808/21311 1071 : > cluster [WRN] map e82758 wrongly marked me down > 2016-09-12 12:27:07.228637 osd.188 10.144.66.174:6832/47910 2806 : > cluster [WRN] map e82759 wrongly marked me down > 2016-09-12 12:27:11.904581 osd.55 10.144.66.172:6852/6485 645 : > cluster [WRN] map e82762 wrongly marked me down > 2016-09-12 12:27:08.513199 osd.76 10.144.66.175:6824/6074 648 : > cluster [WRN] map e82759 wrongly marked me down > 2016-09-12 12:27:10.250008 osd.146 10.144.66.180:6802/8353 1739 : > cluster [WRN] map e82761 wrongly marked me down > 2016-09-12 12:27:35.815834 osd.141 10.144.66.174:6834/49042 3331 : > cluster [WRN] map e82785 wrongly marked me down > 2016-09-12 12:28:32.344378 osd.137 10.144.66.180:6812/27980 1572 : > cluster [WRN] map e82795 wrongly marked me down > 2016-09-12 13:13:20.891681 osd.102 10.144.66.174:6806/18929 2159 : > cluster [WRN] map e82808 wrongly marked me down > 2016-09-12 13:13:22.007868 osd.205 10.144.66.180:6846/76323 2034 : > cluster [WRN] map e82810 wrongly marked me down > 2016-09-12 13:13:22.776924 osd.77 10.144.66.176:6810/24750 1933 : > cluster [WRN] map e82810 wrongly marked me down > 2016-09-12 13:23:11.695542 osd.197 10.144.66.180:6828/58341 1931 : > cluster [WRN] map e82824 wrongly marked me down > 2016-09-12 13:27:21.894787 osd.169 10.144.66.175:6808/5958 321 : > cluster [WRN] map e82840 wrongly marked me down > 2016-09-12 13:27:40.011952 osd.142 10.144.66.178:6857/133781 2109 : > cluster [WRN] map e82850 wrongly marked me down > 2016-09-12 13:56:28.290493 osd.26 10.144.66.172:6866/6702 810 : > cluster [WRN] map e82862 wrongly marked me down > 2016-09-12 13:58:09.993764 osd.225 10.144.66.176:6804/14859 2502 : > cluster [WRN] map e82876 wrongly marked me down > 2016-09-12 13:58:51.077331 osd.28 10.144.66.171:6860/7240 2049 : > cluster [WRN] map e82888 wrongly marked me down > > root@ed-ds-c171:[~]:$ for osd in `grep "marked me down" > /var/log/ceph/ceph.log | awk '{print $3}' | cut -b 5-`; do ceph osd > find $osd | grep host ; done | sort | uniq -c > 4 "host": "ed-ds-c171", > 14 "host": "ed-ds-c172", > 12 "host": "ed-ds-c173", > 14 "host": "ed-ds-c174", > 16 "host": "ed-ds-c175", > 16 "host": "ed-ds-c176", > 10 "host": "ed-ds-c177", > 16 "host": "ed-ds-c178", > 13 "host": "ed-ds-c179", > 21 "host": "ed-ds-c180", > > See that those OSDs are almost evenly distributed across 10 nodes. Our > network guys say that everything is OK on switches, meaning no errors > in logs and no errors on interfaces. > My feeling is that something is definitely wrong with network, but I > cannot find direct evidence for that. How can I debug those issues? > > In OSD logs I see the following messages right prior to "wrongly > marked me down": > > 2016-09-12 07:38:08.933444 7fbbe695e700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15 > 2016-09-12 07:38:08.939339 7fbbe515b700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15 > 2016-09-12 07:38:08.939345 7fbbe695e700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15 > 2016-09-12 07:38:08.955960 7fbbe515b700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15 > 2016-09-12 07:38:08.955973 7fbbe695e700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7fbbdd14b700' had timed out after 15 > 2016-09-12 07:38:08.973254 7fbc38c34700 -1 osd.16 82013 > heartbeat_check: no reply from osd.77 since back 2016-09-12 > 07:37:45.69687 > 0 front 2016-09-12 07:37:45.696870 (cutoff 2016-09-12 07:37:48.973243) > 2016-09-12 07:38:08.973274 7fbc38c34700 -1 osd.16 82013 > heartbeat_check: no reply from osd.137 since back 2016-09-12 > 07:37:26.0550 > 57 front 2016-09-12 07:37:26.055057 (cutoff 2016-09-12 07:37:48.973243) > 2016-09-12 07:38:08.973280 7fbc38c34700 -1 osd.16 82013 > heartbeat_check: no reply from osd.155 since back 2016-09-12 > 07:37:45.6968 > 70 front 2016-09-12 07:37:45.696870 (cutoff 2016-09-12 07:37:48.973243) > 2016-09-12 07:38:08.973286 7fbc38c34700 -1 osd.16 82013 > heartbeat_check: no reply from osd.170 since back 2016-09-12 > 07:37:45.6968 > 70 front 2016-09-12 07:37:45.696870 (cutoff 2016-09-12 07:37:48.973243) > > My ceph.conf: > > [global] > fsid = 5ddb8aab-49b4-4a63-918e-33c569e3101e > mon initial members = ed-ds-c171, ed-ds-c172, ed-ds-c173 > mon host = 10.144.66.171, 10.144.66.172, 10.144.66.173 > auth cluster required = cephx > auth service required = cephx > auth client required = cephx > public network = 10.144.66.0/24 > cluster network = 10.144.126.0/24 > osd pool default size = 3 > osd pool default min size = 1 > osd max backfills = 1 > mon pg warn max per osd = 1000 > mon pg warn max object skew = 1000 > mon lease = 50 > mon lease renew interval = 30 > mon lease ack timeout = 100 > rbd default features = 3 > osd disk thread ioprio priority = 7 > osd disk thread ioprio class = idle > osd crush update on start = false > mon osd down out interval = 900 > osd recovery max active = 1 > osd op threads = 8 > mon osd min down reporters = 5 > > My sysctl,conf: > kernel.msgmnb = 65536 > kernel.msgmax = 65536 > kernel.shmmax = 68719476736 > kernel.shmall = 4294967296 > net.ipv4.ip_local_port_range = 1024 65535 > net.ipv4.tcp_fin_timeout = 15 > net.ipv4.tcp_tw_reuse=1 > net.ipv4.tcp_max_orphans = 131072 > net.core.somaxconn = 16384 > net.core.netdev_max_backlog = 16384 > net.ipv4.tcp_max_syn_backlog = 32768 > net.ipv4.tcp_max_tw_buckets = 524288 > kernel.panic = 180 > net.netfilter.nf_conntrack_max = 262144 > net.core.rmem_max = 56623104 > net.core.wmem_max = 56623104 > net.core.rmem_default = 56623104 > net.core.wmem_default = 56623104 > net.core.optmem_max = 40960 > net.ipv4.tcp_rmem = 4096 87380 56623104 > net.ipv4.tcp_wmem = 4096 65536 56623104 > net.core.netdev_max_backlog = 50000 > net.ipv4.tcp_max_tw_buckets = 2000000 > net.ipv4.tcp_tw_recycle = 0 > net.ipv4.tcp_tw_reuse = 0 > net.ipv4.tcp_fin_timeout = 10 > net.ipv4.tcp_slow_start_after_idle = 0 > net.ipv4.conf.all.send_redirects = 0 > net.ipv4.conf.all.accept_redirects = 0 > net.ipv4.conf.all.accept_source_route = 0 > fs.nr_open = 13109720 > fs.file-max = 13109720 > kernel.pid_max = 4194304 > vm.vfs_cache_pressure=400 > vm.min_free_kbytes=2097152 > net.ipv6.conf.all.disable_ipv6 = 1 > net.ipv6.conf.default.disable_ipv6 = 1 > Can you try by at least reverting all the TCP settings? I would try that first since it seems 'tuning because we can' to me. I've seen various issues with TCP settings in regard to Ceph. Wido > Thanks a lot for any help! > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com