On Wed, Nov 25, 2009 at 7:43 AM, Marcus Kool <marcus.kool@xxxxxxxxxxxxxxx> wrote: > The result of the test with vm.pmap.pg_ps_enabled set to 1 > is ... different than what I expected. > The values of vm.pmap.pde.p_failures and vm.pmap.pde.demotions > indicate that the page daemon has problems creating and > maintaining superpages. Maybe the load is too high > to create superpages or the algorithm of the page daemon > needs to be tuned. Well the load on that machine is 0.09. :-) > But one last try from me: The machine has 24 GB and Squid has > 19 GB. I guess that on the first fork the OS demotes many > superpages because it needs to map the child process to > virtual memory and superpages cannot be swapped and therefore > will be demoted. The second fork demotes more superpages... > To make the first fork fast, Squid must be > less than 10 GB because Squid and its child fit within > physical memory. The demotions occur prior to the fork; I was able to watch both counters increment during the day yesterday. But, what you are saying is where we are at now: assign no more than 25% of system memory to cache_mem and make sure 50% is wasted at all times except log rotation. I did make one change last night which appears to have made a big difference. I noticed in the memory report I posted: Idle pool limit: 5.00 MB We did not have such a value anywhere in our config file; according to the documentation the default is unlimited. So I don't know where that value came from; maybe a documentation tweak is appropriate. But in light of it, I added: memory_pools_limit 10 GB And restarted squid. The memory wastage is the same (still 50% missing with no explanation) but the pmap values are quite different: vm.pmap.pmap_collect_active: 0 vm.pmap.pmap_collect_inactive: 0 vm.pmap.pv_entry_spare: 10202 vm.pmap.pv_entry_allocs: 203328446 vm.pmap.pv_entry_frees: 203268592 vm.pmap.pc_chunk_tryfail: 0 vm.pmap.pc_chunk_frees: 1231037 vm.pmap.pc_chunk_allocs: 1231454 vm.pmap.pc_chunk_count: 417 vm.pmap.pv_entry_count: 59854 <--------- vm.pmap.pde.promotions: 16718 <---------- vm.pmap.pde.p_failures: 247130 vm.pmap.pde.mappings: 0 vm.pmap.pde.demotions: 12403 <----------- vm.pmap.shpgperproc: 200 vm.pmap.pv_entry_max: 7330186 vm.pmap.pg_ps_enabled: 1 The memory usage of squid at this time is only 7544M, but the whole system VSZ K/pv_entry is way up to 160 (from 4.55 before) and the number of superpages is much higher at (16718 - 12403) 4315. Also, I cannot watch "demotions" increase in real time as before; it has been sitting at 12403 all morning and that is barely higher than the 11968 reading last night well before the change. So we will leave it running until rotation tonight to see if this has any effect. One can hope it makes squid fork 40x faster. :) > There are alternative solutions to the problem: > 1. redesign the URL rewriter into a multithreaded application that > accepts multiple requests from Squid simultaneously (use > url_rewrite_concurrency in squid.conf) > This way there will be only one child process and only one fork on > 'squid -k rotate' As with multithreaded in general, that would entail a lot of work to turn a very simple, easy to verify reliable program into a very complex, hard to verify, error-prone one. It's something we've talked about doing, but there is not much enthusiasm for doing it to something so important. > 2. redesign the URL rewriter where the URL rewriter rereads > its configuration/database on receipt of a signal and > send this signal every 24 hours without doing the 'squid -k rotate' > This way squid does not fork. > (maybe you want a 'squid -k rotate' once per week though). We don't want/need the URL rewriter to stop *ever*; it has no need for that. The -k rotate is solely to keep on top of the GB of logs squid generates every day. > 4. use less URL rewriters. You might get an occasional > 'not enough rewriters' warning from Squid in which case the > redirector might be bypassed (use url_rewrite_bypass). > Depending on what exactly the URL rewriter does, this > might be acceptable. > This way Squid does less forks. We cannot bypass the rewriters, and we have already cut them from 48 to 16. In the ordinary case, one rewriter can already handle it: they take on average 150us to respond so one rewriter can handle about 6600 requests per second and the average workload is only 3000 - 10000 requests per minute. But there are sometimes non-average cases so we would not want to serialize incoming requests, and we may get slammed or abused with very high request rates during which we need the extras. Provided the superpages now have a positive effect, the only thing left to do will be to get to the bottom of the memory usage situation. Does everybody, especially on other platforms (e.g. Linux, Solaris) get this same behavior where the process VSZ is double what squid can account for? Thanks!