A few month ago we discussed a problem with 3sec timeouts on tcp
connects (thread "question about 3sec timeouts with tcp").
Unfortunately we can't find a solution. Since then many many people
wrote me a mail asking for advice or further help. Therefore I decided
to post the current state of our investigations.
First of all: the problem is not solved yet - we only have a workaround.
There is a default timeout value of three seconds in the network stack.
The occurence of this timeout can have various reasons. Therefore I'm
not sure if all guys who posted to the thread really had the same
problem ...
Anyway, what we noticed is a dependency on the kernel version itself and
the kernel version of the two involved servers (web server and database
server). If both server have the same kernel version the 3sec timeout
occures less frequently. And specific kernel versions are not affected
at all. We have tested the 2.6.23.x kernel version (I think this has
been the most recent kernel version at that time). All versions before
2.6.23.3 are OK, all versions since 2.6.23.4 are bad. When you look at
the changelog you can see that between these two versions there have
been many changes in the network stack ... The 2.6.24.x kernels and even
the 2.6.25.x kernels are not affected (without any hints in the
changelog). As you can see: it is a matter of trial and error.
What we have done is installing the same 2.6.25.x kernel on all servers
(we have not tested the current kernel version 2.6.26.x yet). This is
working well for us and reduces the 3sec timeouts to a minimum
(approximately 20 timeouts within four million requests). These 20
timeouts can be ignored I think ...
If someone finds a solution for this issue it would be nice to hear
about it ...
Kind regards,
Leo
--
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html