On Tue, 23 Feb 2010 09:08:22 -0300, Felipe W Damasio <felipewd@xxxxxxxxx> wrote: > Hi Mr. Jeffreis, > > 2010/2/22 Amos Jeffries <squid3@xxxxxxxxxxxxx>: >>> The time to do a "/usr/bin/time squidclient >>> http://www.terra.com.br/portal" goes down almost immediately after >>> starting squid. >> >> Please define 'down'. Error pages returning? TCP links closing >> unexpectedly? TCP links hanging? packets never arriving at Squid or web >> server? > > Without any clients, time squidclient is 0.03 seconds. > > Shortly after starting squid, time goes to 3.03 seconds, 6 seconds, > 21 seconds, 3.03 seconds again....and sometimes goes back to > 0.03-0.05s. > > In all cases the request goes to the webserver and back, it's the > time it takes to do so that goes wild :) > The total times 0.03-21sec you are measuring though are round-trip times. Involving DNS lookups and server TCP handshake setups, and in 2.7 short-circuited by memory cached objects. All balanced against the background startup memory allocations, helper startup and checks. >> Is it only for a short period immediately after starting Squid? (ie in >> the >> lag time between startup and ready for service?) > > No. We only test it after the cache.log has the "ready to serve > requests". > > After that, we wait around 5 to 10 seconds, run the ebtables rules > (we're using a bridge setup), and the clients (around 6000 cable > modems) start going through squid. > > And the squidclient start presenting the time. > >>> We tried turning off the cache so we can't have I/O-related >>> slowdowns and had the same results. Neither CPU nor memory seem to be >>> the problem. >> >> How did you 'turn off the cache' ? adding "cache deny all" or removing >> "cache_dir" entries or removing "cache_mem"? > > On squid-2.7, we did it compiling the null store-io module and used: > > cache_dir null /tmp > > And on squid-3.1 we did it using "cache deny all". Ah, those are very different in result. "cache deny all" works the same in both and prevents anything being stored. "cachr_dir null" in squid-2.7 is same as having no cache_dir at all in squid-3.1. Which prevents only disk access. Objects are still cached in memory. > >> If you are using TPROXY it could be network limits in conntrack. Since >> TPROXY requires socket-level tracking and the default conntrack limits >> are >> a bit low for >100MB networks. > > We change the following proc configuration after browsing the web for > tips: > > echo 7 > /proc/sys/net/ipv4/tcp_fin_timeout > echo 15 > /proc/sys/net/ipv4/tcp_keepalive_intvl > echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes > echo 65536 > /proc/sys/vm/min_free_kbytes > echo "262144 1024000 4194304" > /proc/sys/net/ipv4/tcp_rmem > echo "262144 1024000 4194304" > /proc/sys/net/ipv4/tcp_wmem > echo "1024000" > /proc/sys/net/core/rmem_max > echo "1024000" > /proc/sys/net/core/wmem_max > echo "512000" > /proc/sys/net/core/rmem_default > echo "512000" > /proc/sys/net/core/wmem_default > echo "524288" > /proc/sys/net/ipv4/netfilter/ip_conntrack_max > echo "3" > /proc/sys/net/ipv4/tcp_synack_retries > ifconfig br0 txqueuelen 1000 > > These change the default settings of conntrack_max to a higher value, > right? > I'm not sure of the exact settings myself, but possibly. > dmesg doesn't show any errors or warnings. Thats good. > >> It could simply be an overload on the single Squid. Which is only really >> confirmed to 5,000 requests per second. If your 300MBps includes more >> than >> that number of unique requests you may need another Squid instance. >> These configurations are needed for high throughput using Squid: >> http://wiki.squid-cache.org/ConfigExamples/MultiCpuSystem >> http://wiki.squid-cache.org/ConfigExamples/ExtremeCarpFrontend > > We have between 300 and 500 requests per second according to cache > manager. > > But what's puzzling is that squid (2.7) didn't have this behavior > before. Around 15 days ago it started slowing down like this. > > The uptime of the server is 33 days, and we didn't want to reboot > the server since we're using it in bridge-mode. > > But could this slowdown be a side-affect of some network degradation...? > > Also, why the squid performance improves by using multiple > http_port if it is, in the end, a single process? The bottleneck seems > to be the network-part of the system, correct? Yes. The difference with http_port does sound like the connection accepting system is involved somehow. Squid up to and including 3.1 are designed to accept one TCP link at a time and has a small delay queuing each while it waits for the handler to happen. I suspect this short delay may be extended by other event processing in the background. This is per-http_port, so multiple might have one delayed or a full queue while the others are happily still accepting requests. Would you mind grabbing the 3.HEAD code (http://www.squid-cache.org/Versions/v3/HEAD) and running one of these tests on it? We have re-worked the accept there to make the port ready for accepting new connections while the queuing happens instead of after. And done some further optimization on the queuing itself will hopefully cut the top off the longer delays while raising the number accepted. It would be very interesting to see the difference confirmed outside my lab. Amos