Re: "Quadruple" memory usage with squid

Linda Messerschmidt <linda.messerschmidt@xxxxxxxxx> · Wed, 25 Nov 2009 10:08:39 -0500

On Wed, Nov 25, 2009 at 7:43 AM, Marcus Kool
<marcus.kool@xxxxxxxxxxxxxxx> wrote:
> The result of the test with vm.pmap.pg_ps_enabled set to 1
> is ... different than what I expected.
> The values of vm.pmap.pde.p_failures and vm.pmap.pde.demotions
> indicate that the page daemon has problems creating and
> maintaining superpages.  Maybe the load is too high
> to create superpages or the algorithm of the page daemon
> needs to be tuned.

Well the load on that machine is 0.09. :-)

> But one last try from me: The machine has 24 GB and Squid has
> 19 GB. I guess that on the first fork the OS demotes many
> superpages because it needs to map the child process to
> virtual memory and superpages cannot be swapped and therefore
> will be demoted.  The second fork demotes more superpages...
> To make the first fork fast, Squid must be
> less than 10 GB because Squid and its child fit within
> physical memory.

The demotions occur prior to the fork; I was able to watch both
counters increment during the day yesterday.

But, what you are saying is where we are at now: assign no more than
25% of system memory to cache_mem and make sure 50% is wasted at all
times except log rotation.

I did make one change last night which appears to have made a big
difference.  I noticed in the memory report I posted:

Idle pool limit: 5.00 MB

We did not have such a value anywhere in our config file; according to
the documentation the default is unlimited.  So I don't know where
that value came from; maybe a documentation tweak is appropriate.  But
in light of it, I added:

memory_pools_limit 10 GB

And restarted squid.  The memory wastage is the same (still 50%
missing with no explanation) but the pmap values are quite different:

vm.pmap.pmap_collect_active: 0
vm.pmap.pmap_collect_inactive: 0
vm.pmap.pv_entry_spare: 10202
vm.pmap.pv_entry_allocs: 203328446
vm.pmap.pv_entry_frees: 203268592
vm.pmap.pc_chunk_tryfail: 0
vm.pmap.pc_chunk_frees: 1231037
vm.pmap.pc_chunk_allocs: 1231454
vm.pmap.pc_chunk_count: 417
vm.pmap.pv_entry_count: 59854   <---------
vm.pmap.pde.promotions: 16718 <----------
vm.pmap.pde.p_failures: 247130
vm.pmap.pde.mappings: 0
vm.pmap.pde.demotions: 12403 <-----------
vm.pmap.shpgperproc: 200
vm.pmap.pv_entry_max: 7330186
vm.pmap.pg_ps_enabled: 1

The memory usage of squid at this time is only 7544M, but the whole
system VSZ K/pv_entry is way up to 160 (from 4.55 before) and the
number of superpages is much higher at (16718 - 12403) 4315.  Also, I
cannot watch "demotions" increase in real time as before; it has been
sitting at 12403 all morning and that is barely higher than the 11968
reading last night well before the change.

So we will leave it running until rotation tonight to see if this has
any effect.  One can hope it makes squid fork 40x faster. :)

> There are alternative solutions to the problem:
> 1. redesign the URL rewriter into a multithreaded application that
> accepts multiple requests from Squid simultaneously (use
> url_rewrite_concurrency in squid.conf)
> This way there will be only one child process and only one fork on
> 'squid -k rotate'

As with multithreaded in general, that would entail a lot of work to
turn a very simple, easy to verify reliable program into a very
complex, hard to verify, error-prone one.  It's something we've talked
about doing, but there is not much enthusiasm for doing it to
something so important.

> 2. redesign the URL rewriter where the URL rewriter rereads
> its configuration/database on receipt of a signal and
> send this signal every 24 hours without doing the 'squid -k rotate'
> This way squid does not fork.
> (maybe you want a 'squid -k rotate' once per week though).

We don't want/need the URL rewriter to stop *ever*; it has no need for
that.  The -k rotate is solely to keep on top of the GB of logs squid
generates every day.

> 4. use less URL rewriters. You might get an occasional
> 'not enough rewriters' warning from Squid in which case the
> redirector might be bypassed (use url_rewrite_bypass).
> Depending on what exactly the URL rewriter does, this
> might be acceptable.
> This way Squid does less forks.

We cannot bypass the rewriters, and we have already cut them from 48 to 16.

In the ordinary case, one rewriter can already handle it: they take on
average 150us to respond so one rewriter can handle about 6600
requests per second and the average workload is only 3000 - 10000
requests per minute.  But there are sometimes non-average cases so we
would not want to serialize incoming requests, and we may get slammed
or abused with very high request rates during which we need the
extras.

Provided the superpages now have a positive effect, the only thing
left to do will be to get to the bottom of the memory usage situation.

Does everybody, especially on other platforms (e.g. Linux, Solaris)
get this same behavior where the process VSZ is double what squid can
account for?

Thanks!