After a long delay due to my research schedule here at UCSD, I have made a patch that creates the /proc/net/tcphealth file. This file monitors all established TCP connections and reports some health metrics on them. See http://heron.ucsd.edu/tcphealth.php for more information. This patch is for kernel version 2.4.3 and was made with the command 'diff -Naur pristine-linux-2.4.3 linux-2.4.3'. Patches for other versions are available upon request. I believe this patch would make a useful addition to the Linux kernel. Below are the correspondance with all of you this past spring, with the most recent first. Sincerely, Federico David Sacerdoti, UCSD CSE department, San Diego CA ------------- Hello! > Would a patch for 2.4.2 be helpful? Yes, of course. The tool is useful not depending on any curcumstances. Alexey ------------- On Fri, Mar 23, 2001 at 09:19:14PM +0100, Federico David Sacerdoti wrote: > The external monitoring made possible by the /proc/net/tcphealth is > interesting because the SRTT is proportional to the speed of one's > network connection, and duplicate acks indicate that packets are being > lost (or reordered, less likely) somewhere in the network. 2.4 has a special state machine to detect reordering when the connection supports timestamps. I guess some long term statistics (currently TCP_INFO only dumps current state) would be useful too, but it's David's call if he want to put in the few cycles that'll cost (probably only in slow paths anyways) I guess it would be better if you would put it into the existing TCP_INFO framework, perhaps with an additional /proc frontend to TCP_INFO. Having two ways to do a similar thing is not good. -Andi -------------- Date: Fri, 23 Mar 2001 12:19:14 -0800 From: Federico David Sacerdoti <fds@cs.ucsd.edu> The external monitoring made possible by the /proc/net/tcphealth is interesting because the SRTT is proportional to the speed of one's network connection, and duplicate acks indicate that packets are being lost (or reordered, less likely) somewhere in the network. These are things we want to know about a connection we are trying to communicate on - its individual latency and how often packets are being lost over it. Would a patch for 2.4.2 be helpful? -------------- On Fri, Mar 23, 2001 at 01:57:11AM +0100, David S. Miller wrote: > > See the TCP_INFO socket option we added to 2.4.x Sadly TCP_INFO can not be used for external monitoring currently (at least not without very bad and racy hacks to allow /proc to open sockets in /proc/pid/fd) -Andi -------------- Date: Thu, 22 Mar 2001 16:53:44 -0800 From: Federico David Sacerdoti <fds@cs.ucsd.edu> For a graduate network class at UCSD I implemented some TCP performance monitors in the Linux TCP stack (ipv4). I have added a file to the proc filesystem (/proc/net/tcphealth) that monitors the "health" of all tcp connections on a machine. The tcphealth file tracks smoothed Round-Trip-Times, duplicate acks, and duplicate incoming packets for each established tcp connection. I believe that there is lots of good monitoring information that can be gleaned from this file. It works on all TCP connections without the cooperation of the remote server. In the code I have taken care not to disrupt the fast path in tcp_rcv_established(), and generally have tried to step lightly. I have patched kernel versions 2.2.14 and 2.2.16, and tested it on an ix86, a SUN, and a PowerPC. If there is any interest, I will submit the patch to the appropriate maintainer.
diff -Naur pristine-linux-2.4.3/Makefile linux-2.4.3/Makefile --- pristine-linux-2.4.3/Makefile Thu Aug 2 15:46:19 2001 +++ linux-2.4.3/Makefile Thu Aug 2 16:06:53 2001 @@ -1,7 +1,7 @@ VERSION = 2 PATCHLEVEL = 4 SUBLEVEL = 3 -EXTRAVERSION = +EXTRAVERSION = -tcphealth KERNELRELEASE=$(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION) diff -Naur pristine-linux-2.4.3/include/net/sock.h linux-2.4.3/include/net/sock.h --- pristine-linux-2.4.3/include/net/sock.h Thu Aug 2 15:47:08 2001 +++ linux-2.4.3/include/net/sock.h Thu Aug 2 16:02:36 2001 @@ -24,6 +24,7 @@ * Alan Cox : Eliminate low level recv/recvfrom * David S. Miller : New socket lookup architecture. * Steve Whitehouse: Default routines for sock_ops + * Federico David Sacerdoti : Added TCP health counters. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -272,7 +273,8 @@ unsigned long timeout; /* Currently scheduled timeout */ __u32 lrcvtime; /* timestamp of last received data packet*/ __u16 last_seg_size; /* Size of last incoming segment */ __u16 rcv_mss; /* MSS used for delayed ACK decisions */ + __u32 last_ack_sent; /* sequence number of the last ack we sent. */ } ack; /* Data for direct copy to user */ @@ -411,9 +413,18 @@ unsigned int keepalive_time; /* time before keep alive takes place */ unsigned int keepalive_intvl; /* time interval between keep alive probes */ int linger2; + + /* + * TCP health monitoring counters. + */ + __u32 dup_acks_sent; + __u32 dup_pkts_recv; + __u32 acks_sent; + __u32 pkts_recv; + }; - + /* * This structure really needs to be cleaned up. * Most of it is for TCP, and not used by any of diff -Naur pristine-linux-2.4.3/net/ipv4/af_inet.c linux-2.4.3/net/ipv4/af_inet.c --- pristine-linux-2.4.3/net/ipv4/af_inet.c Thu Aug 2 15:47:15 2001 +++ linux-2.4.3/net/ipv4/af_inet.c Thu Aug 2 16:02:36 2001 @@ -54,6 +54,7 @@ * Some other random speedups. * Cyrus Durgin : Cleaned up file for kmod hacks. * Andi Kleen : Fix inet_stream_connect TCP race. + * Federico David Sacerdoti : Added tcphealth proc file * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -128,6 +129,7 @@ extern int afinet_get_info(char *, char **, off_t, int); extern int tcp_get_info(char *, char **, off_t, int); extern int udp_get_info(char *, char **, off_t, int); +extern int tcp_health_get_info(char *, char **, off_t, int); extern void ip_mc_drop_socket(struct sock *sk); #ifdef CONFIG_DLCI @@ -474,7 +476,7 @@ * (ie. your servers still start up even if your ISDN link * is temporarily down) */ - if (sysctl_ip_nonlocal_bind == 0 && + if (sysctl_ip_nonlocal_bind == 0 && sk->protinfo.af_inet.freebind == 0 && addr->sin_addr.s_addr != INADDR_ANY && chk_addr_ret != RTN_LOCAL && @@ -1054,6 +1056,7 @@ proc_net_create ("sockstat", 0, afinet_get_info); proc_net_create ("tcp", 0, tcp_get_info); proc_net_create ("udp", 0, udp_get_info); + proc_net_create ("tcphealth", 0, tcp_health_get_info); #endif /* CONFIG_PROC_FS */ return 0; } diff -Naur pristine-linux-2.4.3/net/ipv4/proc.c linux-2.4.3/net/ipv4/proc.c --- pristine-linux-2.4.3/net/ipv4/proc.c Thu Aug 2 15:47:16 2001 +++ linux-2.4.3/net/ipv4/proc.c Thu Aug 2 16:02:36 2001 @@ -26,6 +26,7 @@ * Andi Kleen : Add support for open_requests and * split functions for more readibility. * Andi Kleen : Add support for /proc/net/netstat + * Federico David Sacerdoti : Added support for /proc/net/tcphealth * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -155,7 +156,7 @@ if (len > length) len = length; if (len < 0) - len = 0; + len = 0; return len; } @@ -212,3 +213,97 @@ len = 0; return len; } + +/* + * Output /proc/net/tcphealth + */ +#define LINESZ 128 + +int tcp_health_get_info(char *buffer, char **start, off_t offset, int length) +{ + int len=0, i=0, num=0; + off_t pos=0, begin=0; + char tmpbuf[LINESZ+1], srcIP[32], destIP[32]; + + unsigned long dest, src, SmoothedRttEstimate, + AcksSent, DupAcksSent, PktsRecv, DupPktsRecv; + unsigned short destp, srcp; + + len = sprintf(buffer, + "TCP Health Monitoring (established connections only)\n" + " -Duplicate ACKs indicate lost/reordered packets on the connection.\n" + " -Duplicate Packets Received show you should be using SACK (rare).\n" + " -RttEst estimates how long a packet takes on a round trip over the connection.\n" + "id Local Address Remote Address RttEst(ms) AcksSent " + "DupAcksSent PktsRecv DupPktsRecv\n"); + pos=len; + + /* Loop through established TCP connections */ + local_bh_disable(); + for (i=0; i < tcp_ehash_size; i++) { + struct tcp_ehash_bucket *head = &tcp_ehash[i]; + struct sock *sk; + struct tcp_opt *tp; + + read_lock(&head->lock); + for (sk=head->chain; sk; sk=sk->next) { + if (!TCP_INET_FAMILY(sk->family)) + continue; + pos+=LINESZ; + if (pos <= offset) + continue; + + dest = ntohl(sk->daddr); + src = ntohl(sk->rcv_saddr); + destp = ntohs(sk->dport); + srcp = ntohs(sk->sport); + + tp = &(sk->tp_pinfo.af_tcp); + SmoothedRttEstimate = (tp->srtt >> 3); + AcksSent = tp->acks_sent; + DupAcksSent = tp->dup_acks_sent; + PktsRecv = tp->pkts_recv; + DupPktsRecv = tp->dup_pkts_recv; + + sprintf(srcIP, "%lu.%lu.%lu.%lu:%u", + ((src >> 24) & 0xFF), ((src >> 16) & 0xFF), ((src >> 8) & 0xFF), (src & 0xFF), + srcp); + sprintf(destIP, "%lu.%lu.%lu.%lu:%u", + ((dest >> 24) & 0xFF), ((dest >> 16) & 0xFF), ((dest >> 8) & 0xFF), (dest & 0xFF), + destp); + + sprintf(tmpbuf, "%d: %-21s %-21s " + "%8lu %8lu %8lu %8lu %8lu", + num, + srcIP, + destIP, + SmoothedRttEstimate, + AcksSent, + DupAcksSent, + PktsRecv, + DupPktsRecv + ); + + len += sprintf(buffer+len, "%-*s\n", LINESZ-1, tmpbuf); + if(pos >= offset+length) { + read_unlock(&head->lock); + goto out; + } + num++; + } + read_unlock(&head->lock); + } + +out: + local_bh_enable(); + + begin = len - (pos - offset); + *start = buffer + begin; + len -= begin; + if(len>length) + len = length; + if (len<0) + len = 0; + return len; +} + diff -Naur pristine-linux-2.4.3/net/ipv4/tcp_input.c linux-2.4.3/net/ipv4/tcp_input.c --- pristine-linux-2.4.3/net/ipv4/tcp_input.c Thu Aug 2 15:47:16 2001 +++ linux-2.4.3/net/ipv4/tcp_input.c Thu Aug 2 16:02:36 2001 @@ -60,6 +60,7 @@ * Pasi Sarolahti, * Panu Kuhlberg: Experimental audit of TCP (re)transmission * engine. Lots of bugs are found. + * Federico David Sacerdoti : Added TCP health monitoring */ #include <linux/config.h> @@ -2489,6 +2490,8 @@ } if (!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt)) { + /* Course retransmit inefficiency- this packet has been received twice. [tcphealth] */ + tp->dup_pkts_recv++; SOCK_DEBUG(sk, "ofo packet was already received \n"); __skb_unlink(skb, skb->list); __kfree_skb(skb); @@ -2584,6 +2587,10 @@ return; } + /* A packet is a "duplicate" if it contains bytes we have already received. [tcphealth] */ + if (before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) + tp->dup_pkts_recv++; + if (!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt)) { /* A retransmit, 2nd most common case. Force an immediate ack. */ NET_INC_STATS_BH(DelayedACKLost); @@ -3180,6 +3187,14 @@ */ tp->saw_tstamp = 0; + + /* + * Tcp health monitoring is interested in + * total per connection packet arrivals. + * There is no way to avoid putting this in the fast + * path. + */ + tp->pkts_recv++; /* pred_flags is 0xS?10 << 16 + snd_wnd * if header_predition is to be made diff -Naur pristine-linux-2.4.3/net/ipv4/tcp_output.c linux-2.4.3/net/ipv4/tcp_output.c --- pristine-linux-2.4.3/net/ipv4/tcp_output.c Thu Aug 2 15:47:16 2001 +++ linux-2.4.3/net/ipv4/tcp_output.c Thu Aug 2 16:05:54 2001 @@ -33,6 +33,7 @@ * Andrea Arcangeli: SYNACK carry ts_recent in tsecr. * Cacophonix Gaul : draft-minshall-nagle-01 * J Hadi Salim : ECN support + * Federico David Sacerdoti : Added TCP health monitoring * */ @@ -1269,9 +1270,16 @@ TCP_SKB_CB(buff)->flags = TCPCB_FLAG_ACK; TCP_SKB_CB(buff)->sacked = 0; + /* If the rcv_nxt has not advanced since sending our last ACK, this is a duplicate. [tcphealth] */ + if (tp->rcv_nxt == tp->ack.last_ack_sent) + tp->dup_acks_sent++; + /* Record the total number of acks sent on this connection [tcphealth]. */ + tp->acks_sent++; + /* Send it off, this clears delayed acks for us. */ TCP_SKB_CB(buff)->seq = TCP_SKB_CB(buff)->end_seq = tcp_acceptable_seq(sk, tp); TCP_SKB_CB(buff)->when = tcp_time_stamp; + tp->ack.last_ack_sent = tp->rcv_nxt; tcp_transmit_skb(sk, buff); } }