Hey, it can be tested in a matter of minutes. If we have some test candidate I will write a small tproxy script to verify the suspect. Eliezer On 09/14/2013 07:39 PM, Nikolai Gorchilov wrote: > Hi, Eliezer, > > On Tue, Sep 10, 2013 at 1:49 AM, Eliezer Croitoru <eliezer@xxxxxxxxxxxx> wrote: >> Hey Nickolai, >> >> I would try to make sense of what you have seen. >> The tproxy is a very complex feature which by the kernel cannot bind >> double src(ip:port) + dst(ip:port).. >> like let say for example the 10.100.1.100 client tries to connect >> 2.3.4.5 at port 80. >> the client tries once for: >> 10.100.1.100:5455 to 2.3.4.5:80 >> then let say the client doesn't have the right route and there is a >> network problem then the client tries again from: >> 10.100.1.100:5456 to 2.3.4.5:80 >> the above client have an issue with the network and the proxy knows that.. >> the proxy is transparent and needs to re-intercept the same request >> twice.. and when the first connection was timedout from the kernel level >> then application can drop the connection and do not continue parsing the >> request. > > The problem I'm facing is not related to user to proxy connection at > all. With proper network setup this works flawlessly. > > It's the proxy to server connection when squid tries to bind to an IP, > without specifying a port, thus leaving the kernel to choose one. > >> the kernel can bind the ip:port of the src to the dst if it knows that >> all 80 port traffic is using only the traffic as a route. >> in a case this is not the case the client will have troubles and hence a >> binding of ip:port to ip:port from the network layer will be a disaster >> for couple layers.. > > Yeah! ip:port pairs have to be unique :-) > >> SO the kernel manages what the bind will be like.. >> I dont see how a tproxy enabled system for more then 10,000 cilents can >> reach a critical level of commbind unless the cpu and all the lower >> levels of the kernel will not be able to handle this level of traffic. > > It's not about number of users, but number of simultaneous live > connections from the cache server. Have in mind "idle" http > connections are "live" tcp streams. > >> if it's the range thing from the kernel it can be reproduced in a matter >> of seconds by lowering it.. > > Exactly. Try something like echo 32768 32867 > > /proc/sys/net/ipv4/ip_local_port_range and you'll start getting > EADDRINUSE on the 101st parallel outbound connection of squid. > >> This limit is not a rule for the application but it limits the kernel to >> what local-ip:port bind when the source machine is the local machine. >> this doesn't force the kernel to handle lower amount of connections but >> allows the kernel to do less lookup when trying to find a free ip:port >> socket to bind to the new connection. >> >> it seems to me like you are using connection tracking on a tproxy system >> that doesn't need to do connection tracking at all in this kind of scale.. >> There is no reason for a tproxy system to keep track on connections of >> the client for more then 5-10 minutes tops.. >> >> try to look more into the connection tracking rather then the basic >> kernel lands.. > > Nope. The problem has nothing to do with TPROXY, nor connection > tracking. It's in the port auto-selection algorithm of the kernel that > limits the number of live auto-selected ports to > ip_local_port_range.max - ip_local_port_range.min. > > Here's some pseudocode to reproduce it, even with local addresses > assigned to the host: > > ===[cut]=== > $broken = true; // ask the kernel to select port > $port_min = ip_local_port_range.min; > $port_max = ip_local_port_range.max; > $ips_to_test_with = {'aaa.aaa.aaa.aaa', 'bbb.bbb.bbb.bbb'); > > function socket_setup($ip, $port) { > $socket = new socket(AF_INET, SOCK_STREAM, SOL_TCP); > $socket.set_option(SOL_SOCKET, SO_REUSEADDR, 1); > $socket.set_option(SOL_IP, IP_TRANSPARENT, 1); // needed only if > $ips_to_test_with are not assigned to the host > $socket.bind($ip, $port); > $socket.listen(); // listen is easier and faster for testing, we > have to just block this socket in the kernel somehow. in the real life > it will be a $socket.connect. > return $socket; > } > > for ($port = $socket_min; $port <= $socket_max; $port++) { > foreach ($ips_to_test_with as $ip) { > if ($broken) { > // will produce exception when $port = floor(($socket_max > - $socket_max) / count($ips_to_test_with)) +1 > socket_setup($ip, 0); > } else { > // will assign all the ports > socket_setup($ip, $port); > } > } > } > > ===[cut]=== > > That's it. Do echo 32768 32867 > > /proc/sys/net/ipv4/ip_local_port_range in try it. Once with $broken = > true, and then again with $broken = false. > > When $broken = true on the 51st port assignment on IP address > aaa.aaa.aaa.aaa you'll get EADDRINUSE. > When $broken = false you'll get both aaa.aaa.aaa.aaa and > bbb.bbb.bbb.bbb listening to 100 ports each and no error. > > Hope this time it's more clear. > > Best, > Niki >