Re: question about 3sec timeouts with tcp

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Joining this conversation and list mid-stream ... if this reply is 
outside the thread will someone please bounce it to the proper one?

Like Leo, we have experienced this 3000ms hang.  We have a network of
around 250 machines doing all kinds of activity: http, ftp, rsync via
ssh, mysql and nfs.  As our site grew, we noticed that an inordinate
amount of our requests were taking 3010ms or 3100ms or even 6010ms.  We
went through the same motions (independently) that Leo did and finally
arrived at the same conclusions.  Regardless of protocol or port,
opening tcp connections in high volume situations would cauase SYN ACK
packets to get "lost" even with local, non-networked connections.

We have our own test script (in perl) that is similar to Leos that
generates the same results.  I'm happy to post if anyone is interested.

Compiling and running Leo's code periodically generates 3000ms hangs
for us as well.

We noticed that the pheneomenon seems to happen at the rate of 4 per
every 1000 connections and only seems to happen when there are 500
connections a second.  This could be a spurious correlation, but one
that we noticed regardless.

We also noticed that every time one of these hangs occurred that the kernel
on the client side would increment the TCPTimeout conter in
/proc/net/netstat.  Not terribly useful, but might provide a starting
point.

We are running a pretty vanilla CENTOS kernel:

2.6.17-1.2171_FC5_PDcentossmp

kernel ipv4/core config:

net.ipv4.icmp_echo_ignore_all: 0
net.ipv4.icmp_echo_ignore_broadcasts: 1
net.ipv4.icmp_errors_use_inbound_ifaddr: 0
net.ipv4.icmp_ignore_bogus_error_responses: 1
net.ipv4.icmp_ratelimit: 250
net.ipv4.icmp_ratemask: 6168
net.ipv4.igmp_max_memberships: 20
net.ipv4.igmp_max_msf: 10
net.ipv4.inet_peer_gc_maxtime: 120
net.ipv4.inet_peer_gc_mintime: 10
net.ipv4.inet_peer_maxttl: 600
net.ipv4.inet_peer_minttl: 120
net.ipv4.inet_peer_threshold: 65664
net.ipv4.ip_autoconfig: 0
net.ipv4.ip_default_ttl: 64
net.ipv4.ip_dynaddr: 0
net.ipv4.ip_forward: 0
net.ipv4.ipfrag_high_thresh: 262144
net.ipv4.ipfrag_low_thresh: 196608
net.ipv4.ipfrag_max_dist: 64
net.ipv4.ipfrag_secret_interval: 600
net.ipv4.ipfrag_time: 30
net.ipv4.ip_local_port_range: 32768      61000
net.ipv4.ip_nonlocal_bind: 0
net.ipv4.ip_no_pmtu_disc: 0
net.ipv4.neigh: cat: neigh: Is a directory
net.ipv4.route: cat: route: Is a directory
net.ipv4.tcp_abc: 1
net.ipv4.tcp_abort_on_overflow: 0
net.ipv4.tcp_adv_win_scale: 2
net.ipv4.tcp_app_win: 31
net.ipv4.tcp_base_mss: 512
net.ipv4.tcp_congestion_control: bic
net.ipv4.tcp_dsack: 1
net.ipv4.tcp_ecn: 0
net.ipv4.tcp_fack: 1
net.ipv4.tcp_fin_timeout: 60
net.ipv4.tcp_frto: 0
net.ipv4.tcp_keepalive_intvl: 75
net.ipv4.tcp_keepalive_probes: 9
net.ipv4.tcp_keepalive_time: 7200
net.ipv4.tcp_low_latency: 0
net.ipv4.tcp_max_orphans: 131072
net.ipv4.tcp_max_syn_backlog: 12288
net.ipv4.tcp_max_tw_buckets: 180000
net.ipv4.tcp_mem: 393216 524288  786432
net.ipv4.tcp_moderate_rcvbuf: 1
net.ipv4.tcp_mtu_probing: 0
net.ipv4.tcp_no_metrics_save: 0
net.ipv4.tcp_orphan_retries: 0
net.ipv4.tcp_reordering: 3
net.ipv4.tcp_retrans_collapse: 1
net.ipv4.tcp_retries1: 3
net.ipv4.tcp_retries2: 15
net.ipv4.tcp_rfc1337: 0
net.ipv4.tcp_rmem: 8192  873800  8738000
net.ipv4.tcp_sack: 1
net.ipv4.tcp_stdurg: 0
net.ipv4.tcp_synack_retries: 5
net.ipv4.tcp_syncookies: 1
net.ipv4.tcp_syn_retries: 5
net.ipv4.tcp_timestamps: 1
net.ipv4.tcp_tso_win_divisor: 3
net.ipv4.tcp_tw_recycle: 0
net.ipv4.tcp_tw_reuse: 0
net.ipv4.tcp_window_scaling: 1
net.ipv4.tcp_wmem: 4096  655360  6553600
net.ipv4.tcp_workaround_signed_windows: 0
net.core.dev_weight: 64
net.core.divert_version: 0.46
net.core.message_burst: 10
net.core.message_cost: 5
net.core.netdev_budget: 300
net.core.netdev_max_backlog: 3000
net.core.optmem_max: 10240
net.core.rmem_default: 262141
net.core.rmem_max: 262141
net.core.somaxconn: 3000
net.core.wmem_default: 262141
net.core.wmem_max: 262141
net.core.xfrm_aevent_etime: 10
net.core.xfrm_aevent_rseqth: 2

These hangs are significantly impacting performance; as an organization we are
focussed on elliminating this problem and are happy to dedicate any
resources at our disposal to correct it.  

Brett Paden
http://multiply.com

--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux