Re: SQUID memory error after vm.swappines changed from 60 to 10

Bike dernikov1 <dernikov1@xxxxxxxxx> · Mon, 13 Nov 2017 10:46:44 +0100

On Fri, Nov 10, 2017 at 11:11 PM, Marcus Kool
<marcus.kool@xxxxxxxxxxxxxxx> wrote:
>
>
> On 10/11/17 12:11, Bike dernikov1 wrote:
>>
>> On Thu, Nov 9, 2017 at 5:13 PM, Marcus Kool <marcus.kool@xxxxxxxxxxxxxxx>
>> wrote:
>>>
>>>
>>>
>>> On 09/11/17 11:04, Bike dernikov1 wrote:
>>> [snip]
>>>>>>
>>>>>>
>>>>>> Memory compsumption:squid use largest part of memory  (12GB now,
>>>>>> second proces use 300MB memory), 14GB used by all process. So squid
>>>>>> use over 80% of total used memory.
>>>>>> So no there are not any problematic process. But we changed swappiness
>>>>>> settings.
>>>>>
>>>>>
>>>>>
>>>>> Did you monitor Squid for growth (it can start with 12 GB and grow
>>>>> slowly) ?
>>>>
>>>>
>>>>
>>>> Yes we are monitoring continuosly.
>>>> Now:
>>>> Output from free -m.
>>>>
>>>>              total       used    free   shared  buff/cache  available
>>>> Mem:  24101     20507  256    146      3337         3034
>>>> Swap: 24561      5040   19521
>>>>
>>>> vm.swappiness=40
>>>>
>>>> Memory by process:
>>>> squid  Virt       RES   SHR  MEM%
>>>>              22,9G  18.7   8164   79,6
>>>
>>>
>>>
>>> Hmm. Squid grew from 12 GB to 18.7 GB (23 GB virtual).
>>
>>
>> Today problem appeared again after logrotate at 2.56AM.
>> Used memory was at peek 23,7GB.
>
>
> ok. it is clear that Squid grows too much.
> On a 24GB system with many helpers and a URL filter I think the maximum size
> should be 14GB.

We set cache_mem to 14GB. For now no problems appeared.
We failed with helpers totally. We had problems with keytab cache, so
we thought that increase number will help.
At first we setup 700 kerberos helpers (insane from now  50 / 13 Active)
Until we disabled cache we couldn't stabilize server.
Disk was too slow, IO wait exploded, CPU load was at one time over
200. Users weren't happy.

>> Before logrorate started, cached was at 2GB, buffer at 1,5GB.
>> After logrorate started cache jumped to 3.7GB and buffer unchanged at
>> 1,5GB.
>>
>> Fork errors stopped after 1 minute. At 2:57.
>> cache memory dropped by 500MB  to 3.2GB and continued at same level
>> till morning, buffer  same at 1.5GB.
>>
>> After 4 at 3:00 minutes new WARNING appeared. external ACL queue
>> overload. Using stale results.
>>
>> We have night shift and they told us that Internet worked ok.
>>
>> After restart at around 7.00AM used memory dropped from 22 GB to 7GB,
>> cache and buffer remain at same levels.
>
>
> How come Squid uses 7 GB at startup when there is no disk cache ?
>
>>> With vm.swappiness=40 Linux starts to page out parts of processes when
>>> they
>>> occupy more than 60% of the memory.
>>> This is a potential bottleneck and I would have also decreased
>>> vm.swappiness
>>> to 10 as you did.
>>>
>>> My guess is that Squid starts too many helpers in a short time frame and
>>> that because of paging there are too many forks in progress
>>> simultaneously
>>> which causes the memory exhaustion.
>>
>>
>> We are now testing with 100 helpers for negotiate_kerberos_auth.
>> vm.swappiness returned to 60.
>>
>>> I suggest to reduce the memory cache of Squid by 50% and set
>>> vm.swappiness
>>> to 20.
>>
>>
>> Squid cache memory is set at 14GB reduced from 16GB from 20GB  in two
>> turns.
>
>
> are you saying that you have
>    cache_mem 14G
> If yes, you should read the memory FAQ and reduce this.
> 'cache_mem 14G' explains that Squid starts 'small' and grows over time.

For our case, what do you recomend.  10GB or even lower ?
Plan reading today, i hope that I will have peace, to concentrate.

>>> And then observe:
>>> - total memory use
>>> - total swap usage (should be lower than the 5 GB that you have now)
>>> - number of helper processes that are started in short time frames
>>> And then in small steps increase the memory cache and maybe further
>>> reduce
>>> vm.swappiness to 10.
>>
>>
>> If we survive with actual setup, we will continue with reducing as you
>> suggest.
>> Last extreme will be swap disable swappof but just for test with 6
>> eyes on monitoring :)
>>
>>>> squidguard two process  300MB boths,.
>>>>
>>>> CPU 0.33 0.37 0.43
>>>>
>>>>> Squid cannot fork and higher swappiness increases the amount of memory
>>>>> that
>>>>> the OS can use to copy processes.
>>>>> It makes me think that you have the memory overcommit set to 2 (no
>>>>> overcommit).
>>>>> What is the output of the following command ?
>>>>>      sysctl  -a | grep overcommit
>>>>
>>>>
>>>>
>>>> Command output:
>>>>
>>>> vm.nr_overcommit_hugepages = 0
>>>> vm.overcommit_kbytes = 0
>>>> vm.overcommit_memory = 0
>>>> vm.overcommit_ratio = 50
>>>>
>>>> cat /proc/sys/vm/overcommit_memory
>>>> 0
>>>
>>>
>>>
>>> The overcommit settings look fine.
>>
>>
>> At least something right :)
>>
>>>>
>>>>>> Advice for some settings:
>>>>>> We have absolute max peak of  2500 users which user squid (of 2800),
>>>>>> what are recomended settings for:
>>>>>> negotiate_kerberos_children start/idle
>>>>>> squidguard helpers.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I have little experience with kerberos, but most likely this is not the
>>>>> issue.
>>>>> When Squid cannot fork the helpers, helper settings do not matter much.
>>>>
>>>>
>>>>
>>>>> For 2500 users you probably need 32-64 squidguard helpers.
>>>>
>>>>
>>>>
>>>> Can you confirm: For 2500 users:
>>>>
>>>> url_rewrite children X (squidguard)  32-64 will be ok ? We have set
>>>> much larger number.
>>
>>
>> Squidguard url_rewrite children was set to 64.
>>
>>> Did I understand it correctly that earlier in this reply you said that
>>> there
>>> are two squidguard processes (300 MB each).
>>
>>
>> Yes (first two process in htop, two rewrite childrens) others was on 0.0%.
>>
>>> ufdbGuard is faster than squidGuard and has multithreaded helpers.
>>> ufdbGuard needs less helpers than squidGuard.
>>> If you have a much larger number than 64 url rewrite helpers than I
>>> suggest
>>> to switch to ufdbGuard as soon as possible since the memory usage is then
>>> at
>>> least 600% less.
>>
>>
>> UfdbGuard have few strong features. Development, kerberos,
>> concurency/multitreading.
>> As i wrote, if we read documentation slower we wouldn't
>> Do ufdbGuard supoort ldap secure auth ? We tried ldap secure with
>> squidguard without success.
>
>
> ufdbGuard supports any user database with the "execuserlist" feature.
> See the Reference Manual for details.

As I can tell we will have much work in front of us.
But no price to high to get rid of TMG  :)
Thanks for help.

>>>> For  helper:
>>>> negotitate_kerberos_auth
>>>>
>>>> auth_param negotiate children X startup Y idle Z. What X, Y, Z are
>>>> best for our user number ?
>>>>
>>>> We disabled kerberos replay cache because of disk performance (4 SAS
>>>> DISK  15K, RAID 10) (iowait jumped high, and CPU load jumped to min
>>>> 40 max 200).
>>>> We don't use disk caching.
>>>>
>>>> Thanks for help,
>>>>
>>>>> Marcus
>>>>>
>>>>>
>>>>>> Thanks for help,
>>>>>>
>>>>>> On Wed, Nov 8, 2017 at 10:53 AM, Marcus Kool
>>>>>> <marcus.kool@xxxxxxxxxxxxxxx> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> There is definitely a problem with available memory because Squid
>>>>>>> cannot
>>>>>>> fork.
>>>>>>> So start with looking at how much memory Squid and its helpers use.
>>>>>>> Do do have other processes on this system that consume a lot of
>>>>>>> memory
>>>>>>> ?
>>>>>>>
>>>>>>> Also note that ufdbGuard uses less memory that squidGuard.
>>>>>>> If there are 30 helpers squidguard uses 300% more memory than
>>>>>>> ufdbGuard.
>>>>>>>
>>>>>>> Look at the wiki for more information about memory usage:
>>>>>>> https://wiki.squid-cache.org/SquidFaq/SquidMemory   (currently has an
>>>>>>> expired certificate but it is safe to go ahead)
>>>>>>>
>>>>>>> Marcus
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 08/11/17 07:26, Bike dernikov1 wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi, I hope that someone can explain what happened, why squid stopped
>>>>>>>> working.
>>>>>>>> The problem is related to  memory/swap handling.
>>>>>>>>
>>>>>>>> After we changed vm.swappiness parameter from 60 to 10 (tuning
>>>>>>>> attempt, to lower a disk usage, because we have only 4 disks in a
>>>>>>>> RAID10, so disk subsystem  is a weak link), we got a lot of errors
>>>>>>>> in
>>>>>>>> cache.log.
>>>>>>>> The problems started after scheduled logrotate after  2AM.
>>>>>>>> Squid ran out of memory, auth helpers stopped working.
>>>>>>>> It's weird because we didn't disable swap, but behavior is like we
>>>>>>>> did.
>>>>>>>> After an error, we increased parameter from 10 to 40.
>>>>>>>>
>>>>>>>> The server has 24GB DDR3 memory,  disk swap set to 24GB, 12 CPU
>>>>>>>> (24HT
>>>>>>>> cores).
>>>>>>>> We have 2800 users, using  kerberos authentication, squidguard for
>>>>>>>> filtering, ldap authorization.
>>>>>>>> When problem appeared memory was still 3GB free (free column), ram
>>>>>>>> (caching) was filled to 15GB, so 21 GB ram filled, 3GB free.
>>>>>>>>
>>>>>>>> Thanks for help,
>>>>>>>>
>>>>>>>>
>>>>>>>> errors from cache.log.
>>>>>>>>
>>>>>>>> 2017/11/08 02:55:27| Set Current Directory to /var/log/squid/
>>>>>>>> 2017/11/08 02:55:27 kid1| storeDirWriteCleanLogs: Starting...
>>>>>>>> 2017/11/08 02:55:27 kid1|   Finished.  Wrote 0 entries.
>>>>>>>> 2017/11/08 02:55:27 kid1|   Took 0.00 seconds (  0.00 entries/sec).
>>>>>>>> 2017/11/08 02:55:27 kid1| logfileRotate:
>>>>>>>> daemon:/var/log/squid/access.log
>>>>>>>> 2017/11/08 02:55:27 kid1| logfileRotate:
>>>>>>>> daemon:/var/log/squid/access.log
>>>>>>>> 2017/11/08 02:55:28 kid1| Pinger socket opened on FD 30
>>>>>>>> 2017/11/08 02:55:28 kid1| helperOpenServers: Starting 1/1000
>>>>>>>> 'squidGuard' processes
>>>>>>>> 2017/11/08 02:55:28 kid1| ipcCreate: fork: (12) Cannot allocate
>>>>>>>> memory
>>>>>>>> 2017/11/08 02:55:28 kid1| WARNING: Cannot run '/usr/bin/squidGuard'
>>>>>>>> process.
>>>>>>>> 2017/11/08 02:55:28 kid1| helperOpenServers: Starting 300/3000
>>>>>>>> 'negotiate_kerberos_auth' processes
>>>>>>>> 2017/11/08 02:55:28 kid1| ipcCreate: fork: (12) Cannot allocate
>>>>>>>> memory
>>>>>>>> 2017/11/08 02:55:28 kid1| WARNING: Cannot run
>>>>>>>> '/usr/lib/squid/negotiate_kerberos_auth' process.
>>>>>>>> 2017/11/08 02:55:28 kid1| ipcCreate: fork: (12) Cannot allocate
>>>>>>>> memory
>>>>>>>> 2017/11/08 02:55:28 kid1| WARNING: Cannot run
>>>>>>>> '/usr/lib/squid/negotiate_kerberos_auth' process.
>>>>>>>> 2017/11/08 02:55:28 kid1| ipcCreate: fork: (12) Cannot allocate
>>>>>>>> memory
>>>>>>>> 2017/11/08 02:55:28 kid1| WARNING: Cannot run
>>>>>>>> '/usr/lib/squid/negotiate_kerberos_auth' process.
>>>>>>>>
>>>>>>>> external ACL 'memberof' queue overload. Using stale result.
>>>>>>>> _______________________________________________
>>>>>>>> squid-users mailing list
>>>>>>>> squid-users@xxxxxxxxxxxxxxxxxxxxx
>>>>>>>> http://lists.squid-cache.org/listinfo/squid-users
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> squid-users mailing list
>>>>>>> squid-users@xxxxxxxxxxxxxxxxxxxxx
>>>>>>> http://lists.squid-cache.org/listinfo/squid-users
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> squid-users mailing list
>>>>> squid-users@xxxxxxxxxxxxxxxxxxxxx
>>>>> http://lists.squid-cache.org/listinfo/squid-users
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> squid-users mailing list
>>> squid-users@xxxxxxxxxxxxxxxxxxxxx
>>> http://lists.squid-cache.org/listinfo/squid-users
>>
>>
>>
> _______________________________________________
> squid-users mailing list
> squid-users@xxxxxxxxxxxxxxxxxxxxx
> http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users