Hello Alex, Just wanted to write back and say you advice helped. After digging into our problem a bit further we found that we were being effected by the bug#4796. Applying the patch you suggested to our build for squid fixed the problem and we are able to run squid under much greater loads with out issue now. Thanks, -- Joseph M Jones Senior Application Engineer EAN – Expedia Affiliate Network From: squid-users <squid-users-bounces@xxxxxxxxxxxxxxxxxxxxx> on behalf of Alex Rousskov <rousskov@xxxxxxxxxxxxxxxxxxxxxxx> Sent: Thursday, October 11, 2018 16:44 To: squid-users@xxxxxxxxxxxxxxxxxxxxx Subject: Re: Squid workers failing to restart after log rotation event. On 10/11/2018 06:30 AM, Joseph Jones wrote: > I'm trying to find a root cause for failed workers. My suggestions: 0. Upgrade to the latest Squid v4. 1. Disable memory cache: cache_mem 0 2. Maintain one cache.log per worker. For example: cache_log /usr/local/squid/var/logs/cache-${process_number}.log 3. Focus on the logged messages of the worker that fails _first_. Zero in on the first failure of that worker. Why did it fail? Your log snippets appear to show what happened _after_ at least one failure. It is usually best to focus on the original failure. Check system logs as well -- some relevant messages may only appear there. 4. With the information from #3, check whether you are suffering from bug #4796. It has a (hidden in the PR discussion) short-term fix: https://bugs.squid-cache.org/show_bug.cgi?id=4796 https://github.com/squid-cache/squid/pull/257#issuecomment-427271426 If your bug is different, and the first failure of the failed-first worker left a core dump, post a stack trace. Configure your OS to allow core dumps, of course. You can just skip to step #4 and see if the patch helps, but steps #0-3 are useful in general should you face similar problems in the future. HTH, Alex. We have three squid instances that act as transparent forward proxies that limit internet connectivity for our network by doing url whitelisting. Current throughput per instances is about 90MB/s. after a restart of squid all workers seem to be working just fine, but after about an hour some of the workers fail and they never come back until we do a complete restart. These are EC2 instances in AWS (c5.4xlarge). so we have 16 vCPU to work with. but that's really 8 Cores and 2 Threads per core. > > CPU and Memory loads are small. Disk IO could be a concern. We've ran some load test on a different instances with logging turned off and we were able to get a higher throughput without worker failure. We don't have caching enabled as most of our traffic is SSL anyway. I Was hoping someone could point us in a direction we should take our testing or if from the information I've given can tell use any obvious this we may be doing wrong. > > > $ uptime > 18:28:30 up 6 days, 2:13, 1 user, load average: 0.88, 1.10, 1.09 > > $ free -m > total used free shared buff/cache available > Mem: 30987 1728 25264 1156 3994 27523 > Swap: 0 0 0 > > > > 2018/10/10 18:19:42 kid2| Squid Cache (Version 4.1): Terminated abnormally. > CPU Usage: 0.036 seconds = 0.022 user + 0.014 sys > Maximum Resident Size: 92544 KB > Page faults with physical i/o: 0 > 2018/10/10 18:19:42 kid1| Closing HTTP(S) port [::]:3129 > 2018/10/10 18:19:42 kid1| Closing HTTP(S) port [::]:3128 > 2018/10/10 18:19:42 kid1| Closing HTTP(S) port [::]:3130 > 2018/10/10 18:19:42 kid1| storeDirWriteCleanLogs: Starting... > 2018/10/10 18:19:42 kid1| Finished. Wrote 0 entries. > 2018/10/10 18:19:42 kid1| Took 0.00 seconds ( 0.00 entries/sec). > 2018/10/10 18:19:42 kid1| FATAL: kid1 registration timed out > 2018/10/10 18:19:42 kid1| Squid Cache (Version 4.1): Terminated abnormally. > CPU Usage: 0.034 seconds = 0.021 user + 0.013 sys > Maximum Resident Size: 92544 KB > Page faults with physical i/o: 0 > > $ cat /etc/redhat-release > CentOS Linux release 7.5.1804 (Core) > > $ squid -v > Squid Cache: Version 4.1 > Service Name: squid > > This binary uses OpenSSL 1.0.2k-fips 26 Jan 2017. For legal restrictions on distribution see https://www.openssl.org/source/license.html > > configure options: '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--exec_prefix=/usr' '--libexecdir=/usr/lib64/squid' '--localstatedir=/var' '--datadir=/usr/share/squid' '--sysconfdir=/etc/squid' '--with-logdir=$(localstatedir)/log/squid' '--with-pidfile=$(localstatedir)/run/squid.pid' '--disable-dependency-tracking' '--enable-follow-x-forwarded-for' '--enable-auth' '--enable-auth-basic=DB,LDAP,NCSA,NIS,PAM,POP3,RADIUS,SASL,SMB,getpwnam,fake' '--enable-auth-ntlm=fake' '--enable-auth-digest=file,LDAP,eDirectory' '--enable-auth-negotiate=kerberos,wrapper' '--enable-external-acl-helpers=wbinfo_group,kerberos_ldap_group,LDAP_group,delayer,file_userip,SQL_session,unix_group,session,time_quota' '--enable-cache-digests' '--enable-cachemgr-hostname=localhost' '--enable-delay-pools' '--enable-epoll' '--enable-smp' '--enable-icap-client' '--enable-ident-lookups' '--enable-linux-netfilter' '--enable-removal-policies=heap,lru' '--enable-snmp' '--enable-storeio=aufs,diskd,ufs,rock' '--enable-wccpv2' '--enable-esi' '--enable-security-cert-generators' '--enable-security-cert-validators' '--enable-icmp' '--with-aio' '--with-default-user=squid' '--with-filedescriptors=16384' '--with-dl' '--with-openssl' '--enable-ssl-crtd' '--with-pthreads' '--with-included-ltdl' '--disable-arch-native' '--enable-ecap' '--without-nettle' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' 'LDFLAGS=-Wl,-z,relro ' 'CXXFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-g cc-switches -m64 -mtune=generic -fPIC' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig' --enable-ltdl-convenience > > $ cat /etc/squid/squid.conf > workers 12 > hopeless_kid_revival_delay 5 minute > # Default 'squid' logformat with request size and TLS SNI added > logformat ean_squid %ts.%03tu %6tr %>a %Ss/%03>Hs %>st %<st %rm %ru %ssl::>sni %[un %Sh/%<a %mt > logfile_rotate 0 > access_log daemon:/var/log/squid/access.log logformat=ean_squid > debug_options ALL,1 > > # Only allow cachemgr access from localhost > http_access allow localhost manager > http_access deny manager > > acl localnet src 10.26.128.0/21 > > acl SSL_ports port 443 > acl Safe_ports port 80 # http > acl Safe_ports port 443 # https > > acl CONNECT method CONNECT > > # > # Recommended minimum Access Permission configuration: > # > # Deny requests to certain unsafe ports > http_access deny !Safe_ports > > # Deny CONNECT to other than secure SSL ports > http_access deny CONNECT !SSL_ports > > # Allow requests from the local network (see acl at the top) > http_access allow localnet > > # > # INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS > # > > # Just for debugging > # debug_options ALL,1 33,2 rotate=0 > > acl https_whitelist ssl::server_name "/etc/squid/whitelist.txt" > acl http_whitelist dstdomain "/etc/squid/whitelist.txt" > > acl step1 at_step SslBump1 > acl step2 at_step SslBump2 > acl step3 at_step SslBump3 > > http_access allow http_whitelist > > ssl_bump peek step1 all > > ssl_bump peek step2 https_whitelist > ssl_bump splice step3 https_whitelist > ssl_bump terminate step2 all > > > # disable caching > cache deny all > > # And finally deny all other access to this proxy > http_access deny all > > # Squid normally listens to port 3128 > http_port 3129 intercept > http_port 3128 > https_port 3130 cert=/etc/pki/tls/certs/squid.pem key=/etc/pki/tls/private/squid.key ssl-bump intercept > > visible_hostname squid > > # Uncomment and adjust the following to add a disk cache directory. > #cache_dir ufs /var/spool/squid 100 16 256 > > # Leave coredumps in the first cache dir > coredump_dir /var/spool/squid > > # > # Add any of your own refresh_pattern entries above these. > # > refresh_pattern ^ftp: 1440 20% 10080 > refresh_pattern ^gopher: 1440 0% 1440 > refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 > refresh_pattern . 0 20% 4320 > > > -- > > > Joseph M Jones > > Senior Application Engineer > Expedia Partner Solutions > _______________________________________________ > squid-users mailing list > squid-users@xxxxxxxxxxxxxxxxxxxxx > http://lists.squid-cache.org/listinfo/squid-users > _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users