Re: Three second connect delays

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Leo,

Thanks for the info...

----- "Leo" <neleo@xxxxxxx> wrote:

> Dear Nathan,
> 
> unfortunately nobody has answered to my first post, but at the same
> time there was another thread about 3sec timeouts. Therefore we have
> discussed the issue on this thread:
> 
> http://marc.info/?t=120655182600018&r=1&w=2
> 
> Please read this thread for further details. So far we have a
> workaround but no solution
> (http://marc.info/?l=linux-net&m=121803213124143&w=2).
> But AFAIK the current kernel versions are not affected anymore.
> 
> Hope this helps!
> 
> Kind regards,
> Leo
> 
> 
> Nathan Scott wrote:
> > Hi Leo,
> >
> > I came across your mail related to a 3 second connect
> > delay from early last year:
> > http://marc.info/?l=linux-net&m=120214156112338&w=2
> >
> > Did you ever get to the bottom of this?  We're also now
> > observing it in our environment (we're running RHEL5, so
> > yet another, and different, kernel version to you).
> >
> > thanks!
> >

I've been digging into this a little too, so it was good to read
of other peoples experiences and analysis of this issue so far.
I have observed it as recently as fc11 (2.6.29.6-217, so its not
clear whether it is fixed yet or not - I've not tried a current
mainline kernel as yet though).

I came across another workaround, and have some more information
that might help the networking folks zero in on this some more,
though I'm quickly bit out of my depth in some of the networking
code.

I've attached a slimmed-down and modified version of your test
case (leo.c).  With no arguments, this is your original test case
and it hits the problem very quickly.  With an argument, this now
sets SO_LINGER, which seems to avoid the problem...

$ gcc -g -Wall -o leo leo.c
$ ./leo
127.0.0.1(25): 3.000366 seconds elapsed after 262 iterations
$ ./leo 1
127.0.0.1(25): 1000 connections completed
127.0.0.1(25): 2000 connections completed
127.0.0.1(25): 3000 connections completed
127.0.0.1(25): 4000 connections completed
127.0.0.1(25): 5000 connections completed
127.0.0.1(25): 6000 connections completed
127.0.0.1(25): 7000 connections completed
127.0.0.1(25): 8000 connections completed
127.0.0.1(25): 9000 connections completed
127.0.0.1(25): 10000 connections completed
127.0.0.1(25): 11000 connections completed

I've also attached a systemtap script which I've been using to
attempt to find the source of the delay, and to follow the code
in the kernel thats executing when the test case fails.

Below is some of its output, from the tail of a failed run.  It
suggests to me that perhaps we have a race condition here, in
the cleanup following a close(2) on the socket file descriptor.
If we set SO_LINGER, then close() will wait "until all queued
messages for the socket have been successfully sent or the
linger timeout has  been  reached."  The systemtap trace points
toward inet_stream_connect() as the source of the 3sec delay -
it calls inet_wait_for_connect() which AIUI loops waiting for a
TCPF_SYN_SENT|TCPF_SYN_RECV state.  I'm guessing that somehow
the messages which would normally set those state are consumed
in a different part of the code when the race occurs.  It will
need someone with stronger net-fu than me to figure that out,
though.

Here's the output from the attached stap script, anyway, with a
few random thoughts interspersed in braces...

$ sudo stap socket.stap
...
sys_connect: elapsed 70us (started 1250728561153903us, ended 1250728561153973us): leo

[thats the end of the previous, non-failing connect(2) call)]

lock_sock started: 1250728561153977 ns
lock_sock: elapsed 1us (started 1250728561153977us, ended 1250728561153978us): leo
release_sock: elapsed 1us (started 1250728561153980us, ended 1250728561153981us): leo
lock_sock started: 1250728561153985 ns
lock_sock: elapsed 1us (started 1250728561153985us, ended 1250728561153986us): leo
sk_stream_wait_close: elapsed 1us (started 1250728561153993us, ended 1250728561153994us): leo
release_sock: elapsed 1us (started 1250728561153996us, ended 1250728561153997us): leo
tcp_close: elapsed 15us (started 1250728561153984us, ended 1250728561153999us): leo
inet_release: elapsed 18us (started 1250728561153983us, ended 1250728561154001us): leo
sock_close: elapsed 27us (started 1250728561153976us, ended 1250728561154003us): leo
lock_sock started: 1250728561154023 ns
lock_sock: elapsed 2us (started 1250728561154023us, ended 1250728561154025us): leo
tcp_connect: elapsed 14us (started 1250728561154032us, ended 1250728561154046us): leo
tcp_v4_connect: elapsed 20us (started 1250728561154029us, ended 1250728561154049us): leo
release_sock: elapsed 1us (started 1250728561154068us, ended 1250728561154069us): leo
schedule_timeout CALLED BY unknown, 9223372036854775807 timeo

[bzzt.  note the start timestamp on schedule() below - 125072856_4_...
compared to the end of tcp_v4_connect - 125072856_1_... -- 3 seconds.
And release_sock, called just before schedule_timout in the call to
inet_wait_for_connect has the ..._1_... timestamp - so, pretty sure we
lose the 3 seconds right there]

schedule: elapsed 6us (started 1250728564153581us, ended 1250728564153587us): leo
schedule_timeout CALLED BY inet_stream_connect, 9223372036854775807 rval
Returning from: 0xffffffff800637c9 : schedule_timeout+0x0/0xad
Returning to  : 0xffffffff800591e6 : inet_stream_connect+0x132/0x236
 0xffffffff8009df5d : autoremove_wake_function+0x0/0x2e
 0xffffffff8007e833 : kretprobe_trampoline_holder+0x0/0x2
 0xffffffff800b4153 : audit_syscall_entry+0x16e/0x1a1
 0xffffffff8007e833 : kretprobe_trampoline_holder+0x0/0x2
schedule_timeout: elapsed 69us (started 1250728564153545us, ended 1250728564153614us): leo
lock_sock started: 1250728564153619 ns
lock_sock: elapsed 2us (started 1250728564153620us, ended 1250728564153622us): leo
release_sock: elapsed 0us (started 1250728564153629us, ended 1250728564153629us): leo
inet_stream_connect: elapsed 2999617us (started 1250728561154023us, ended 1250728564153640us): leo

[note elapsed time above - 3sec somewhere inside inet_stream_connect,
but in the second half of that routine, as it calls tcp_v4_connect and
that shows an early-on timestamp at its end]

Returning from: 0xffffffff800590b4 : inet_stream_connect+0x0/0x236
Returning to  : 0xffffffff8020d9db : sys_connect+0x7e/0xae
 0xffffffff800b4153 : audit_syscall_entry+0x16e/0x1a1
 0xffffffff8007e833 : kretprobe_trampoline_holder+0x0/0x2

[we also don't see the final elapsed time for sys_connect now, as
my systemtap script calls exit as soon as any function takes more
than a second to complete].


This trace data, with Gabriels observation ( in this message:
http://marc.info/?l=linux-net&m=120716473303355&w=2 ) that "The
client network device "sees" the SYN/ACK packet, but it is not 
processed by the network stack", and the LINGER workaround, all
make me suspect a race somewhere thats consuming the extra SYN
somewhere that it perhaps shouldn't be, and the following call
to sys_connect ends up being a victim.

Anyway, I'm a bit outta my depth at this point and hope this can
help other, more knowledgeable folk looking into this problem!

cheers.

-- 
Nathan
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/wait.h>
#include <netinet/tcp.h>
#include <netinet/in.h>
#include <arpa/inet.h>

void linger(int sock, int ltime)
{
   struct linger linger;

   linger.l_onoff = 1;
   linger.l_linger = ltime;
   setsockopt(sock, SOL_SOCKET, SO_LINGER, (char *)&linger, sizeof(linger));
}

int main(int argc, char **argv)
{
   /* edit Server and Port */
   char *server = "127.0.0.1";
   int port = 25;

   int sock;
   struct sockaddr_in sock_addr;
   int count = 0;
   int lingertime = argc > 1 ? atoi(argv[1]) : 0;
   struct timeval tim;

   while(1) {
      gettimeofday(&tim, NULL);
      double t1 = tim.tv_sec + (tim.tv_usec/1000000.0);

      if ((sock = socket(PF_INET, SOCK_STREAM, IPPROTO_IP)) < 0) {
         fprintf(stderr, "ERROR: cannot create socket\n");
         exit(1);
      }

      memset(&sock_addr, 0, sizeof (sock_addr));
      sock_addr.sin_family = AF_INET;
      sock_addr.sin_addr.s_addr = inet_addr(server);
      sock_addr.sin_port = htons(port);
      if (connect(sock, (struct sockaddr *)&sock_addr, sizeof(sock_addr)) < 0) {
         fprintf(stderr, "ERROR: cannot connect to %s\n", server);
         exit(1);
      }

      if (argc > 1)
         linger(sock, lingertime);

      close(sock);

      gettimeofday(&tim, NULL);
      double t2=tim.tv_sec+(tim.tv_usec/1000000.0);
      if ((t2-t1) > 1) {
         printf("%s(%d): %.6lf seconds elapsed after %d iterations\n", server, port, t2-t1, count);
         exit(1);
      }

      count++;
      if (!(count % 1000))
         printf("%s: %d connections completed\n", server, count);
   }
   return 0;
}

Attachment: socket.stap
Description: Binary data


[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux