Re: Squid network read()'s only 2k long?

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Tue, 02 Nov 2010 00:10:25 +0000

On Mon, 1 Nov 2010 22:55:12 +0000, Declan White <declanw@xxxxxxxxxxxx>
wrote:
> On Mon, Nov 01, 2010 at 09:36:53PM +0000, Amos Jeffries wrote:
>> On Mon, 1 Nov 2010 15:00:21 +0000, declanw@xxxxxxxxxxxx wrote:
>> > I went for a rummage in the code for the buffer size decisions, but
got
>> > very very lost in the OO abstractions very quickly. Can anyone point
>> > me at
>> > anything I can tweak to fix this?
>> 
>> It's a global macro defined by auto-probing your operating systems TCP
>> receiving buffer when building. Default is 16KB and max is 64KB. There
>> may
>> also be auto-probing done at run time.
>> 
>> It is tunable at run-time with
>> http://www.squid-cache.org/Doc/config/tcp_recv_bufsize/
> 
> Oh thank God! Thanks :) (and annoyed with myself that I missed that)
> 
>> The others have already covered the main points of this. ufdbGuard is
>> probably the way to go once you have restricted the size down by
>> elminiating all the entries which can be done with dstdomain and other
>> faster ACL types.
> 
> Aye, I've got much to ruminate over, but it does all sounds promising.
>  
>> > Beyond that, I assume, to get the most out of a multi-cpu system I
>> > should
>> > be running one squid per CPU, which means I need more IP's and that
>> > they
>> > can't share their memory or disk caches with each other directly, and
I
>> > would need to switch on HTCP to try and re-merge them?
>> 
>> Possibly. You may want to test out 3.2 with SMP support. Reports have
>> been
>> good so far (for a beta).
> 
> Alas I'm already flying a little too close to the wind just running
3.1.9. 
> This'll all be live soon, now we traced a ftp code nullref coredump :
> 
> +++ ../squid-3.1.8/src/ftp.cc   Wed Oct 27 14:21:01 2010
> @@ -3707,1 +3707,1 @@
> -    else
> +    else if (ctrl.last_reply)
> @@ -3709,0 +3709,2 @@
> +    else
> +        reply = "" ; 

Looks like one of the side effects of 3090:
http://www.squid-cache.org/Versions/v3/3.1/changesets/

(just fixing the reply text makes squid produce a regular error page where
it should have produced an auth challenge to get some usable Basic-auth
credentials).

> 
>> > Build: Sun Solaris 9
>> > PATH=~/sunstudio12.0/bin:$PATH ./configure CC=cc CXX=CC CFLAGS="-fast
>> > -xtarget=ultra3i -m64 -xipo" CXXFLAGS="-fast -xtarget=ultra3i -m64
>> > -xipo"
>> > --enable-cache-digests --enable-removal-policies=lru,heap
>> > --enable-storeio=aufs,ufs --enable-devpoll
>> 
>> Ah. You will definitely be wanting 3.1.9. /dev/poll support is included
>> and several ACL problems specific to the S9 are fixed.
> 
> Aye, I'm the one that whined at my local dev to patch devpoll back in
;-)
> 
> Actually, I *just* found out my freshly deployed 3.1.9 with
> --enable-devpoll
> does NOT use devpoll, as configure prioritises poll() above it, which
> kinda defeats the point of the exercise :)

Gah. My fault. Sorry. Fix applied. It *may* have been in time for todays
snapshot.

> 
> --- configure~  Mon Nov  1 21:26:53 2010
> +++ configure   Mon Nov  1 21:26:53 2010
> @@ -46912,10 +46912,10 @@
>         SELECT_TYPE="epoll"
>  elif test -z "$disable_kqueue" && test "$ac_cv_func_kqueue" = "yes" ;
then
>         SELECT_TYPE="kqueue"
> -elif test -z "$disable_poll" && test "$ac_cv_func_poll" = "yes" ; then
> -        SELECT_TYPE="poll"
>  elif test "x$enable_devpoll" != "xno" && test "x$ac_cv_devpoll_works" =
>  "xyes"; then
>          SELECT_TYPE="devpoll"
> +elif test -z "$disable_poll" && test "$ac_cv_func_poll" = "yes" ; then
> +        SELECT_TYPE="poll"
>  elif test -z "$disable_select" && test "$ac_cv_func_select" = "yes" ;
then
>         case "$host_os" in
>         mingw|mingw32)
> 
> has fixed that. Yes, I should have edited the .in and autoconfed, but
I'm
> scared of autoconf.
> 
>> > Tuney bits of Config:
>> > htcp_port 0
>> > icp_port 0
>> > digest_generation off   
>> > quick_abort_min 0 KB    
>> > quick_abort_max 0 KB    
>> > read_ahead_gap 64 KB    
>> > store_avg_object_size 16 KB     
>> > read_timeout 5 minutes                  
>> > request_timeout 30 seconds              
>> > persistent_request_timeout 30 seconds   
>> > pconn_timeout 3 seconds
>> 
>> NOTE: pconn_timeout tuning can no longer be done based on info from
older
>> versions. There have been a LOT of fixes that make 3.1.8+ pconn support
>> HTTP compliant, used more often and less resources hungry than older
>> versions.
> 
> Oh I hadn't measured it or anything :) I've just seen linux servers
> collapse
> from complications with SYN queues and client exponential backoff. I
just
> need a hint of a permanent connection to avoid that connection-thrashing
> scenario, but I don't have the resources to keep things around 'just in
> case'.

This reminds me we don't have a max limit on active pconns.

>  
>> > cache_mem 512 MB                        
>> > maximum_object_size_in_memory 64 KB     
>> 
>> NP: It's worth noting that 3.x has fixed the large file in memory
>> problems
>> which 2.x suffers from. 3.x will handle them in linear time instead of
>> with
>> exponential CPU load.
> 
> Good to hear :) But I don't have the memory to stretch much beyond 512
atm,
> as squid seems to take 1.2Gb VM with these setting alone, and no disk
> cache.
> I do wonder if I overcooked the read_ahead_gap though..

64KB is about the buffer size Squid uses internally, so that is about
right for keeping a completely full buffer I think.

>  
>> > memory_replacement_policy heap GDSF    
>> > ignore_expect_100 on
>> 
>> If this is actually a problem you may benefit extra from 3.2 beta here
as
>> well.
> 
> The GDSF is just to up the per-req hits. I'm hoping to get disk cache
going
> for larger objects later with the opposite policy emphasis.
> 
> To be frank, I don't know if I need ignore_expect_100 on or not :)

Okay. I think from the resource comments above you want it OFF. Squid will
respond to HTTP/1.1 "Expect:" requests immediately and broken clients that
can't handle the required HTTP/1.1 replies disappear with error pages.

When set to on, it makes Squid act like really old HTTP/1.0 software and
not respond. All clients, broken or not, then hold the connection resources
open while they wait for a recovery timeout to kick in. It's been known to
take several minutes for some apps.

Amos