Re: "Quadruple" memory usage with squid

Linda Messerschmidt <linda.messerschmidt@xxxxxxxxx> · Wed, 25 Nov 2009 00:47:46 -0500

On Tue, Nov 24, 2009 at 9:52 PM, Marcus Kool
<marcus.kool@xxxxxxxxxxxxxxx> wrote:
> Linda started this thread with huge performance problems
> when Squid with a size of 12 GB forks 15 times.

I'm still not convinced that our memory usage is correct.  Here's
where the cache_mem 10GB process is at right now:

	Maximum Resident Size: 19376000 KB
	Total accounted:       9735764 KB
	Storage Mem size:	9296872 KB
	338229 StoreEntries
	338229 StoreEntries with MemObjects

So there is (9735764 - 9296872) 438,892 KB "overhead" (memory
accounted for that is not used for storing objects) and (19376000 -
9735764) 9,640,236 KB "unaccounted."

I will put the whole memory report at the bottom of this message
though it is very hard to read. :-)

> Linda emailed me that she is doing a test with
> vm.pmap.pg_ps_enabled set to 1 (the kernel will
> transparently transform 4K pages into superpages)
> which gives a big relief for TLB management
> and will most likely reduce performance problems with
> forks of very large processes.

I apologize I meant to send that to the whole list.

In any case, it did not help.  This is from top while it was doing
daily rotation a few moments ago:

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
 1205 squid       1 118    0 18650M 18675M CPU6    1  57:22 97.85% squid
74418 squid       1 118    0 16259M    24K CPU2    2   0:05 97.66% squid

It doesn't look like the superpages enabled had a lot of effect:

$ sysctl vm.pmap
vm.pmap.pmap_collect_active: 0
vm.pmap.pmap_collect_inactive: 0
vm.pmap.pv_entry_spare: 6673
vm.pmap.pv_entry_allocs: 105465793
vm.pmap.pv_entry_frees: 100913786
vm.pmap.pc_chunk_tryfail: 0
vm.pmap.pc_chunk_frees: 610560
vm.pmap.pc_chunk_allocs: 637695
vm.pmap.pc_chunk_count: 27135
vm.pmap.pv_entry_count: 4552007   <--------
vm.pmap.pde.promotions: 12657 <------
vm.pmap.pde.p_failures: 222291
vm.pmap.pde.mappings: 0
vm.pmap.pde.demotions: 11968 <-------
vm.pmap.shpgperproc: 200
vm.pmap.pv_entry_max: 7330186
vm.pmap.pg_ps_enabled: 1

Here are similar readings from the 16GB machine, where the squid
process has VSZ of 8599M:

vm.pmap.pmap_collect_active: 0
vm.pmap.pmap_collect_inactive: 0
vm.pmap.pv_entry_spare: 16299
vm.pmap.pv_entry_allocs: 2196230824
vm.pmap.pv_entry_frees: 2193907723
vm.pmap.pc_chunk_tryfail: 0
vm.pmap.pc_chunk_frees: 13548719
vm.pmap.pc_chunk_allocs: 13562644
vm.pmap.pc_chunk_count: 13925
vm.pmap.pv_entry_count: 2323101  <------------
vm.pmap.pde.promotions: 0
vm.pmap.pde.p_failures: 0
vm.pmap.pde.mappings: 0
vm.pmap.pde.demotions: 0
vm.pmap.shpgperproc: 200
vm.pmap.pv_entry_max: 4276871
vm.pmap.pg_ps_enabled: 0

Squid uses almost all of the memory in use on both systems, so the
rough value of:

No superpages: 2323101 / 8599 = 270 pages per MB
Yes superpages: 4552007 / 18650 = 244 pages per MB

That's not quite fair since it's the whole system's page table vs.
squid's address space but these machines are all about squid so it's
fair enough.

I added up all the system's VSZ and divided it into the updated number
of pages; the system average is just 4.554KB per page, so it seems
like there must be barely any super pages in use, which is borne out
by the (promotions - demotions) figure of (12657 - 11968) 689 active
superpages.  Assuming that means what I think it means.

> can we modify Squid to fork at a slower pace?

I think there is a config option (sleep_after_fork) for that already.
But it would not help as the processes are already forked one at a
time and the page table duplication happens before the exec.  Even if
it did help, sleep_after_fork would have to be several seconds, and
squid would still be nonresponsive for sleep_after_fork * 16.

It sounds like this problem is a bit more serious and complicated than
I thought.  I really hoped it was a config error on our part. :-(

Here is the full memory report as the process VSZ crossed over to
19,001 MB, which shows a total of 9,317,346 KB in use:

Current memory usage:
Pool	 Obj Size	Allocated					 In Use Idle			 Allocations Saved			 Hit Rate	
 	 (bytes)	(#)	 (KB)	 high (KB)	 high (hrs)	 impact (%total)	(#)	
(KB)	 high (KB)	 portion (%alloc)	(#)	 (KB)	 high (KB)	(number)	
(%num)	 (%vol)	(%num)	(number)
2K Buffer           	 2048	 96	 192	 248	 0.52	 0	 91	 182	 248	 95	
5	 10	 94	 34958835	 4.42	 18.12	 99.96	 34972340
4K Buffer           	 4096	 199	 796	 3904	 11.98	 0	 194	 776	 3904	
97	 5	 20	 1720	 7225790	 0.91	 7.49	 98.41	 7342427
8K Buffer           	 8192	 21	 168	 272	 12.05	 0	 17	 136	 272	 81	
4	 32	 176	 1880965	 0.24	 3.90	 98.40	 1911548
16K Buffer          	 16384	 3	 48	 64	 13.39	 0	 0	 0	 64	 0	 3	 48	
64	 3600	 0.00	 0.01	 98.41	 3658
32K Buffer          	 32768	 2	 64	 128	 14.56	 0	 0	 0	 128	 0	 2	
64	 128	 369	 0.00	 0.00	 98.14	 376
64K Buffer          	 65536	 2	 128	 192	 12.81	 0	 0	 0	 192	 0	 2	
128	 128	 88	 0.00	 0.00	 95.65	 92
Store Client Buffer 	 4096	 22	 88	 1052	 12.05	 0	 11	 44	 1052	 50	
11	 44	 812	 30568572	 3.87	 31.69	 99.74	 30646811
acl                 	   64	 11	 1	 1	 15.35	 0	 11	 1	 1	 100	 0	 0	
1	 11	 0.00	 0.00	 50.00	 22
acl_ip_data         	   24	 7	 1	 1	 15.35	 0	 7	 1	 1	 100	 0	 0	 1	
7	 0.00	 0.00	 50.00	 14
acl_list            	   24	 18	 1	 1	 15.35	 0	 18	 1	 1	 100	 0	 0	
1	 18	 0.00	 0.00	 50.00	 36
dlink_node          	   24	 337	 8	 8	 0.52	 0	 0	 0	 8	 0	 337	 8	 8	
1088	 0.00	 0.00	 76.35	 1425
FwdServer           	   24	 41	 1	 6	 12.05	 0	 9	 1	 6	 22	 32	 1	 5	
3875752	 0.49	 0.02	 99.87	 3880603
HttpReply           	  168	 338927	 55606	 55606	 0.00	 1	 338922
55605	 55606	 100	 5	 1	 23	 10817354	 1.37	 0.46	 96.17	 11248509
HttpHeaderEntry     	   40	 2602377	 101656	 101656	 0.00	 1	 2602282	
101652	 101656	 100	 95	 4	 101	 200511069	 25.36	 2.03	 98.06	
204487269
HttpHdrCc           	   40	 27921	 1091	 1091	 0.01	 0	 27914	 1091	
1091	 100	 7	 1	 3	 7294400	 0.92	 0.07	 99.17	 7355123
HttpHdrRangeSpec    	   16	 96	 2	 2	 13.43	 0	 0	 0	 2	 0	 96	 2	 2	
25596	 0.00	 0.00	 99.31	 25775
HttpHdrRange        	   16	 2	 1	 1	 12.07	 0	 0	 0	 1	 0	 2	 1	 1	
21759	 0.00	 0.00	 99.44	 21881
HttpHdrContRange    	   24	 5	 1	 1	 13.46	 0	 0	 0	 1	 0	 5	 1	 1	
10713	 0.00	 0.00	 97.21	 11021
intlist             	   16	 3	 1	 1	 15.35	 0	 3	 1	 1	 100	 0	 0	 1	
3	 0.00	 0.00	 50.00	 6
MemObject           	  272	 338920	 90026	 90026	 0.00	 1	 338915
90025	 90026	 100	 5	 2	 39	 5022931	 0.64	 0.35	 93.21	 5388889
mem_node            	 4112	 2338883	 9392078	 9392078	 0.00	 96	
2338871	 9392029	 9392078	 100	 12	 49	 3908	 15349368	 1.94	 15.98	
81.20	 18902877
relist              	   48	 8	 1	 1	 15.35	 0	 8	 1	 1	 100	 0	 0	 1	
8	 0.00	 0.00	 50.00	 16
request_t           	 1360	 341	 453	 893	 12.05	 0	 326	 433	 893	
96	 15	 20	 496	 9456631	 1.20	 3.26	 98.60	 9590792
StoreEntry          	   88	 338920	 29126	 29126	 0.00	 0	 338915
29126	 29126	 100	 5	 1	 13	 5025544	 0.64	 0.11	 93.26	 5388889
wordlist            	   16	 14	 1	 1	 15.35	 0	 11	 1	 1	 79	 3	 1	 1	
88	 0.00	 0.00	 86.27	 102
ClientInfo          	  344	 28096	 9439	 14657	 12.40	 0	 15714	 5279	
14657	 56	 12382	 4160	 4852	 653528	 0.08	 0.06	 93.63	 697962
MD5 digest          	   16	 338920	 5296	 5296	 0.00	 0	 338915 5296	
5296	 100	 5	 1	 3	 13112283	 1.66	 0.05	 97.40	 13462778
helper_request      	   40	 353	 14	 14	 0.52	 0	 1	 1	 14	 0	 352	
14	 14	 4805587	 0.61	 0.05	 99.99	 4805971
storeSwapTLV        	   24	 5	 1	 1	 15.32	 0	 0	 0	 1	 0	 5	 1	 1	
47627493	 6.02	 0.29	 100.00	 47627498
Short Strings       	   36	 2939320	 103336	 103336	 0.00	 1	 2939241	
103333	 103336	 100	 79	 3	 75	 219162796	 27.72	 2.00	 98.15	
223283726
Medium Strings      	  128	 15166	 1896	 1951	 0.52	 0	 15126	 1891	
1951	 100	 40	 5	 89	 50029245	 6.33	 1.62	 99.39	 50334410
Long Strings        	  512	 5927	 2964	 3002	 0.52	 0	 5911	 2956	
3002	 100	 16	 8	 143	 9251916	 1.17	 1.20	 99.22	 9325056
cbdata acl_access (1)	   56	 13	 1	 1	 15.35	 0	 13	 1	 1	 100	 0	 0	
1	 13	 0.00	 0.00	 50.00	 26
cbdata aclCheck_t (2)	  312	
daf
 1	 1	 1	 15.32	 0	 0	 0	 1	 0	 1	 1	 1	 22575783	 2.86	 1.78	 100.00	 22575832
cbdata clientHttpRequest (3)	 1096	 21	 23	 481	 0.52	 0	 15	 17	 481	
71	 6	 7	 387	 4735510	 0.60	 1.31	 98.21	 4821935
cbdata ConnStateData (4)	  320	 194	 61	 299	 11.98	 0	 190	 60	 299	
98	 4	 2	 137	 2604058	 0.33	 0.21	 97.41	 2673373
cbdata ErrorState (5)	  160	 200	 32	 32	 0.06	 0	 199	 32	 32	 100	
1	 1	 1	 16266	 0.00	 0.00	 98.79	 16466
cbdata FwdState (6) 	   96	 31	 3	 24	 12.05	 0	 9	 1	 24	 29	 22	 3	
20	 3871284	 0.49	 0.09	 99.76	 3880603
cbdata generic_cbdata (7)	   32	 30	 1	 4	 15.32	 0	 0	 0	 4	 0	 30	
1	 4	 14750620	 1.87	 0.12	 100.00	 14750938
cbdata helper (8)   	  136	 1	 1	 1	 15.35	 0	 1	 1	 1	 100	 0	 0	 0	
0	 0.00	 0.00	 0.00	 1
cbdata helper_server (9)	  152	 16	 3	 3	 15.35	 0	 16	 3	 3	 100	 0	
0	 3	 32	 0.00	 0.00	 66.67	 48
cbdata HttpStateData (12)	  152	 38	 6	 37	 12.05	 0	 9	 2	 37	 24	
29	 5	 32	 3874204	 0.49	 0.15	 99.85	 3880194
cbdata ps_state (14)	  200	 1	 1	 1	 15.32	 0	 0	 0	 1	 0	 1	 1	 1	
3880602	 0.49	 0.20	 100.00	 3880603
cbdata RemovalPolicy (15)	  104	 1	 1	 1	 15.35	 0	 1	 1	 1	 100	 0	
0	 0	 0	 0.00	 0.00	 0.00	 1
cbdata store_client (18)	  128	 167	 21	 45	 12.05	 0	 133	 17	 45	
80	 34	 5	 29	 8643422	 1.09	 0.28	 99.89	 8652735
event               	   48	 34	 2	 3	 12.38	 0	 31	 2	 3	 91	 3	 1	 1	
8007262	 1.01	 0.10	 99.96	 8010458
cbdata http_port_list (19)	  128	 2	 1	 1	 14.75	 0	 2	 1	 1	 100	 0	
0	 0	 0	 0.00	 0.00	 0.00	 2
cbdata body_size (20)	   64	 1	 1	 1	 15.35	 0	 1	 1	 1	 100	 0	 0	 1	
1	 0.00	 0.00	 50.00	 2
cbdata ConnectStateData (21)	   96	 16	 2	 11	 15.32	 0	 0	 0	 11	 0	
16	 2	 11	 3826157	 0.48	 0.09	 99.96	 3827561
close_handler       	   24	 236	 6	 31	 12.05	 0	 224	 6	 31	 95	 12	
1	 13	 14200692	 1.80	 0.09	 99.57	 14261946
ipcache_entry       	  128	 978	 123	 137	 13.32	 0	 951	 119	 137	
97	 27	 4	 22	 611237	 0.08	 0.02	 98.86	 618298
fqdncache_entry     	  160	 20	 4	 4	 15.35	 0	 20	 4	 4	 100	 0	 0	
4	 24	 0.00	 0.00	 54.55	 44
cbdata idns_query (22)	 8680	 2	 17	 941	 15.32	 0	 0	 0	 941	 0	 2	
17	 941	 616884	 0.08	 1.36	 99.78	 618218
cbdata redirectStateData (23)	   72	 353	 25	 25	 0.52	 0	 1	 1	 25	
0	 352	 25	 25	 4805595	 0.61	 0.09	 99.99	 4805971
cbdata Logfile (26) 	 1120	 3	 4	 4	 15.35	 0	 3	 4	 4	 100	 0	 0	 4	
3	 0.00	 0.00	 50.00	 6
pconn_data          	   32	 6	 1	 1	 13.34	 0	 1	 1	 1	 17	 5	 1	 1	
61876	 0.01	 0.00	 99.69	 62069
pconn_fds           	   32	 6	 1	 1	 13.34	 0	 1	 1	 1	 17	 5	 1	 1	
61889	 0.01	 0.00	 99.71	 62069
cbdata AddVaryState (29)	  160	 1	 1	 2	 12.38	 0	 0	 0	 2	 0	 1	 1	
2	 305954	 0.04	 0.01	 99.84	 306435
cbdata LocateVaryState (30)	  136	 3	 1	 1	 12.38	 0	 0	 0	 1	 0	 3	
1	 1	 571615	 0.07	 0.02	 99.99	 571661
VaryData            	   32	 6	 1	 1	 13.62	 0	 1	 1	 1	 17	 5	 1	 1	
570046	 0.07	 0.00	 99.72	 571661
cbdata StoreUpdateState (31)	 4152	 2	 9	 41	 12.34	 0	 0	 0	 41	 0	
2	 9	 41	 1044654	 0.13	 1.10	 99.71	 1047729
Total               	 -   	 9317346	 9794810	 9794810	 0.00	 100	
9303255	 9790117	 9790185	 100	 14091	 4693	 5120	 776329091	 98.19	
95.79	 98.19	 790614788
Cumulative allocated volume: 395.07 GB
Current overhead: 301304 bytes (0.003%)
Idle pool limit: 5.00 MB
memPoolAlloc calls: 790614788
memPoolFree calls: 781311532
String Pool	 Impact		
 	 (%strings)	 (%volume)
Short Strings       	 99	 96
Medium Strings      	 1	 2
Long Strings        	 0	 3
Other Strings       	 0	 0

Large buffers: 0 (0 KB)

Are we 100% certain this is normal / correct?  Cutting the process
size in half would double the process spawn rate. :)

Thanks!