On 5/12/2013 1:45 p.m., Eliezer Croitoru wrote: > Hey Saravanan, > > The main issue is that we can try to support you in a very basic way but > note that if it's a BUG it cannot be fixed later rather then porting a > patch manually or to try newer versions of squid. Going by the description here and back in Aug when it was last posted tis is not a bug exactly. But normal behaviour of TCP combined with TPROXY limitations and large traffic flows. When sockets run out traffic gets throttled until more become available. When the level of constant TCP connection churn grows higher than sockets becoming available there grows a backlog of client connections holding sockets open and waiting for service. Part of the problem is TPROXY. Each outgoing connection requires identical src-IP:dst-IP:dst-port triplet as the incoming ones, thus sharing the 64K src-port range between both inbound and outbound connections. Normally this can be using a different src-IP with a full 64K ports on each side of the proxy. So you *start* with that handicap, then on top of it this proxy is churning through 42 sockets per second. For every port released for re-use while it is in TIME_WAIT status 40K other sockets have been needed (yes 40K needed out of ~32K available). So TCP is constantly running a backlog of available sockets. Most of which are consumed by new client connections. Imagine the machine only had 16 sockets in total, and those sockets needed to wait for 10 seconds before each use. Note that with a proxy each connection requires 2 sockets (client connection and server connection. eg in/out of the proxy). (for the sake of simplicity this description assumes the socket is done with in almost zero time). 1) When traffic is arriving at a rate of 1 connection every 2 seconds everything looks perfectly fine. * 1 client socket gets used, and one server socket. Then released and for the next 10 seconds there are 14 sockets available. * during that 10 second period, 5 more connections arrive and 10 sockets get used. * leaving the machine with 4 free sockets at the same time the first 2 are being re-added to the available pool. Making it 4-6 sockets constatly free. 2) Compare that to a traffic rate of just 1 connection every second. To begin with everything seens perfectly fine. * the first 8 connections happen perfectly. However they take 8 seconds and completely empty the available pool of sockets. ** what is the proxy to do? it must wait for 2 more seconds for the next sockets to be available. * during that 2 seconds another 2 connections have been attempted. * when the first 2 sockets become available both sockets get used by accept() * the socket pool is now empty again and the proxy must wait another 1 second for more sockets to become available. - the proxy now has 2 inbound connections waiting to be served, 7 inbound sockets in TIME_WAIT and 7 outbound sockets in TIME_WAIT. * when the second 2 sockets become available, one is used to receive the new waiting connection and one used to service an existing connection. * things continue until we reach the 16 second mark. - this is the repeat of that point when no new sockets were finishing TIME_WAIT. * at the 20 second mark socekts are becoming available again - the proxy now has 4 inbound connections waiting to be served, 6 inbound sockets in TIME_WAIT and 6 outbound sockets in TIME_WAIT. ... the cycle continues with the gap between inbound and outbound growing by 2 sockets every 8 seconds. If the clients were to all be extremely patient the machine would end up with all sockets being used by inbound connections and none for outbound. However, Squid contains a reserved-FD feature to prevent that situation happening and clients get impatient and disconnect when the wait is too long. So you will always see traffic flowing, but it will flow at a much reduced rate with ever longer delays visible to clients, and somewhat "bursty" flow rates as clients give up in bunches. Notice how in (2) there is is all the *extra* waiting time above and beyond what the traffic would normally take going through the proxy. In fact the slower the traffic through the proxy the worse the problem becomes as without connection persistence the transaction time is added on top of each TIME_WAIT for socket re-use. The important thing to be aware of is that this is normal behaviour, nasty as it is. You will hit it on any proxy or relay software if you throw large numbers of new TCP connections at it fast enough. There are two ways to avoid this: 1) reducing the amount of sockets the proxy allows to be closed. In other words enable persistent connections in HTTP. Both server and client connections. It is not perfect (especially in the old Squid like 2.6) but it avoids a lot of TIME_WAIT delays between HTTP requests. * then reduce the request processing time spent holding those sockets. So that more traffic can flow through faster overall. 2) reduce the traffic loading on that machine. Proxying traffic takes time and resources no matter what you do. There is an upper limit on how much proxy software can handle, it does vary by configuration but you appear to have exceeded the capacity of the proxy you are using. SaRaVanAn : Im not sure if your version of Squid has collapsed_forwarding feature or not. If it does turning that on may help reduce the requirements on server sockets and raise the traffic capacity. Also, anything you can do to improve the caching will help reduce the server socket requirements as well. Whatever service this is that you are running is long overdue for scaling up to multiple machines for the client contact. Consider doing that with newer version Squid, that way you can compare the two and present a case for upgrading the old one. I've dropped in a few suggestions on how you can improve the service times in 2.6 with config tweaks but other than persistent connections there is very marginal gain to be had unless you scale out. > On 04/12/13 18:02, SaRaVanAn wrote: >> Hi All, >> I need a help on this issue. On heavy network traffic with squid >> running, link bandwidth is not utilized properly. If I bypass squid, >> my link bandwidth is utilized properly. >> >> Updated topology: >> ============= >> (10 >> Mbps Link) >> client< ------- > Squid Box <-------> Proxy client<------> Proxy >> server<---> webserver >> >> During problem scenario, I could see more tcp sessions with FIN_WAIT_1 >> state in Proxy server . I also observed that Recv -q in CLOSE_WAIT >> state is getting increased in Squid Box. The number of tcp sessions >> from Squid to webserver are also getting dropped drastically. >> >> Squid.conf >> ======== >> http_port 3128 tproxy transparent >> http_port 80 accel defaultsite=xyz.abc.com >> hierarchy_stoplist cgi-bin >> acl VIDEO url_regex ^http://fa\.video\.abc\.com Replace with: acl VIDEO dstdomain fa.video.abc.com >> cache allow VIDEO >> acl QUERY urlpath_regex cgi-bin \? >> cache deny QUERY >> acl apache rep_header Server ^Apache >> broken_vary_encoding allow apache >> cache_mem 100 MB >> cache_swap_low 70 >> cache_swap_high 80 >> maximum_object_size 51200 KB >> maximum_object_size_in_memory 10 KB Thats pretty low object size for memory objects. If you can tune that higher and increase the cache_mem at all it will help serve more objects faster out of RAM than waiting for disk loads to happen. >> ipcache_size 8192 >> fqdncache_size 8192 >> cache_replacement_policy heap LFUDA >> memory_replacement_policy heap LFUDA >> cache_dir aufs //var/logs/cache 6144 16 256 >> access_log //var/logs/access.log squid >> cache_log //var/logs/cache.log >> cache_store_log none >> mime_table //var/opt/abs/config/acpu/mime.conf >> pid_filename //var/run/squid.pid >> refresh_pattern -i fa.video.abc.com/* 600 0% 600 override-expire >> override-lastmod reload-into-ims ignore-reload >> refresh_pattern -i video.abc.com/* 600 0% 600 override-expire >> override-lastmod reload-into-ims ignore-reload >> refresh_pattern -i media.abc.com/* 600 0% 600 override-expire >> override-lastmod reload-into-ims ignore-reload >> refresh_pattern -i xyz.abc.com/.*\.js 600 200% 600 override-expire >> override-lastmod reload-into-ims >> refresh_pattern -i xyz.abc.com/.*\.gif 600 200% 600 override-expire >> override-lastmod reload-into-ims >> refresh_pattern -i xyz.abc.com/.*\.jpg 600 200% 600 override-expire >> override-lastmod reload-into-ims >> refresh_pattern -i xyz.abc.com/.*\.jpg 600 200% 600 override-expire >> override-lastmod reload-into-ims >> refresh_pattern -i xyz.abc.com/.*\.png 600 200% 600 override-expire >> override-lastmod reload-into-ims >> refresh_pattern -i xyz.abc.com/.*\.css 600 200% 600 override-expire >> override-lastmod reload-into-ims >> refresh_pattern -i ^http://.wsj./.* 10 200% 10 override-expire >> override-lastmod reload-into-ims ignore-reload >> refresh_pattern -i \.(gif|png|jpg|jpeg|ico)$ 480 100% 480 >> override-expire override-lastmod reload-into-ims >> refresh_pattern -i \.(htm|html|js|css)$ 480 100% 480 override-expire >> override-lastmod reload-into-ims >> refresh_pattern ^ftp: 1440 20% 10080 >> refresh_pattern ^gopher: 1440 0% 1440 >> refresh_pattern . 0 20% 4320 >> quick_abort_min 0 KB >> quick_abort_max 0 KB >> negative_ttl 1 minutes >> positive_dns_ttl 1800 seconds >> forward_timeout 2 minutes >> acl all src 0.0.0.0/0.0.0.0 Use this instead of the above: acl all src all >> acl manager proto cache_object >> acl localhost src 127.0.0.1/255.255.255.255 >> acl to_localhost dst 127.0.0.0/8 That is a very old and incorrect definition. It should be: acl to_localhost dst 127.0.0.0/8 0.0.0.0/32 >> acl SSL_ports port 443 >> acl Safe_ports port 80 >> acl Safe_ports port 21 >> acl Safe_ports port 443 >> acl Safe_ports port 70 >> acl Safe_ports port 210 >> acl Safe_ports port 1025-65535 >> acl Safe_ports port 280 >> acl Safe_ports port 488 >> acl Safe_ports port 591 >> acl Safe_ports port 777 >> acl CONNECT method CONNECT >> acl video_server dstdomain cs.video.abc.com >> always_direct allow video_server >> acl PURGE method PURGE >> http_access allow PURGE localhost >> http_access deny PURGE >> http_access allow manager localhost >> http_access deny manager >> http_access deny !Safe_ports >> http_access deny CONNECT all >> http_access allow all >> icp_access allow all >> tcp_outgoing_address 172.19.134.2 >> visible_hostname 172.19.134.2 >> server_persistent_connections off Turning server_persistent_connections on will reduce the socket churn on server sockets and allow a higher trafffic capacity to be reached. >> logfile_rotate 1 >> error_maphttp://localhost:1000/abp/squidError.do 404 >> memory_pools off >> store_objects_per_bucket 100 >> strip_query_terms off >> coredump_dir //var/cache >> store_dir_select_algorithm round-robin >> cache_peer 172.19.134.2 parent 1000 0 no-query no-digest originserver >> name=aportal >> cache_peerwww.abc.com parent 80 0 no-query no-digest originserver >> name=dotcom >> cache_peer guides.abc.com parent 80 0 no-query no-digest originserver >> name=travelguide >> cache_peer selfcare.abc.com parent 80 0 no-query no-digest >> originserver name=selfcare >> cache_peer abcd.mediaroom.com parent 80 0 no-query no-digest >> originserver name=mediaroom >> acl webtrends url_regex ^http://statse\.webtrendslive\.com >> acl the_host dstdom_regex xyz\.abc\.com >> acl abp_regex url_regex ^http://xyz\.abc\.com/abp >> acl gbp_regex url_regex ^http://xyz\.abc\.com/gbp >> acl abcdstatic_regex url_regex ^http://xyz\.goginflight\.com/static >> acl dotcom_regex url_regex ^www\.abc\.com >> acl dotcomstatic_regex url_regex ^www\.abc\.com/static >> acl travelguide_regex url_regex ^http://guides\.abc\.com >> acl selfcare_regex url_regex ^http://selfcare\.abc\.com >> acl mediaroom_regex url_regex ^http://abcd\.mediaroom\.com >> never_direct allow abp_regex >> cache_peer_access aportal allow abp_regex Alternatively this will be a smidgin faster: acl webtrends dstdomain statse.webtrendslive.com acl the_host dstdomain .xyz.abc.com acl abp_regex urlpath_regex ^/abp acl gbp_regex urlpath_regex ^/gbp acl abcdstatic_regex dstdomain xyz.goginflight.com acl static urlpath_regex ^/static acl dotcom dstdomain www.abc.com acl travelguide dstdomain guides.abc.com acl selfcare dstdomain selfcare.abc.com acl mediaroom dstdomain abcd.mediaroom.com never_direct allow the_host abp_regex cache_peer_access aportal allow the_host abp_regex cache_peer_access dotcom allow CONNECT dotcom cache_peer_access travelguide allow travelguide cache_peer_access selfcare allow selfcare cache_peer_access mediaroom allow mediaroom cache deny webtrends Snipped the second and third copy of the squid.conf settings. Or do you really have the access controls defined three times in your squid.conf? if so that could be cleaned up too for even more speed gains. Amos