On Tue, Nov 24, 2009 at 9:52 PM, Marcus Kool <marcus.kool@xxxxxxxxxxxxxxx> wrote: > Linda started this thread with huge performance problems > when Squid with a size of 12 GB forks 15 times. I'm still not convinced that our memory usage is correct. Here's where the cache_mem 10GB process is at right now: Maximum Resident Size: 19376000 KB Total accounted: 9735764 KB Storage Mem size: 9296872 KB 338229 StoreEntries 338229 StoreEntries with MemObjects So there is (9735764 - 9296872) 438,892 KB "overhead" (memory accounted for that is not used for storing objects) and (19376000 - 9735764) 9,640,236 KB "unaccounted." I will put the whole memory report at the bottom of this message though it is very hard to read. :-) > Linda emailed me that she is doing a test with > vm.pmap.pg_ps_enabled set to 1 (the kernel will > transparently transform 4K pages into superpages) > which gives a big relief for TLB management > and will most likely reduce performance problems with > forks of very large processes. I apologize I meant to send that to the whole list. In any case, it did not help. This is from top while it was doing daily rotation a few moments ago: PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 1205 squid 1 118 0 18650M 18675M CPU6 1 57:22 97.85% squid 74418 squid 1 118 0 16259M 24K CPU2 2 0:05 97.66% squid It doesn't look like the superpages enabled had a lot of effect: $ sysctl vm.pmap vm.pmap.pmap_collect_active: 0 vm.pmap.pmap_collect_inactive: 0 vm.pmap.pv_entry_spare: 6673 vm.pmap.pv_entry_allocs: 105465793 vm.pmap.pv_entry_frees: 100913786 vm.pmap.pc_chunk_tryfail: 0 vm.pmap.pc_chunk_frees: 610560 vm.pmap.pc_chunk_allocs: 637695 vm.pmap.pc_chunk_count: 27135 vm.pmap.pv_entry_count: 4552007 <-------- vm.pmap.pde.promotions: 12657 <------ vm.pmap.pde.p_failures: 222291 vm.pmap.pde.mappings: 0 vm.pmap.pde.demotions: 11968 <------- vm.pmap.shpgperproc: 200 vm.pmap.pv_entry_max: 7330186 vm.pmap.pg_ps_enabled: 1 Here are similar readings from the 16GB machine, where the squid process has VSZ of 8599M: vm.pmap.pmap_collect_active: 0 vm.pmap.pmap_collect_inactive: 0 vm.pmap.pv_entry_spare: 16299 vm.pmap.pv_entry_allocs: 2196230824 vm.pmap.pv_entry_frees: 2193907723 vm.pmap.pc_chunk_tryfail: 0 vm.pmap.pc_chunk_frees: 13548719 vm.pmap.pc_chunk_allocs: 13562644 vm.pmap.pc_chunk_count: 13925 vm.pmap.pv_entry_count: 2323101 <------------ vm.pmap.pde.promotions: 0 vm.pmap.pde.p_failures: 0 vm.pmap.pde.mappings: 0 vm.pmap.pde.demotions: 0 vm.pmap.shpgperproc: 200 vm.pmap.pv_entry_max: 4276871 vm.pmap.pg_ps_enabled: 0 Squid uses almost all of the memory in use on both systems, so the rough value of: No superpages: 2323101 / 8599 = 270 pages per MB Yes superpages: 4552007 / 18650 = 244 pages per MB That's not quite fair since it's the whole system's page table vs. squid's address space but these machines are all about squid so it's fair enough. I added up all the system's VSZ and divided it into the updated number of pages; the system average is just 4.554KB per page, so it seems like there must be barely any super pages in use, which is borne out by the (promotions - demotions) figure of (12657 - 11968) 689 active superpages. Assuming that means what I think it means. > can we modify Squid to fork at a slower pace? I think there is a config option (sleep_after_fork) for that already. But it would not help as the processes are already forked one at a time and the page table duplication happens before the exec. Even if it did help, sleep_after_fork would have to be several seconds, and squid would still be nonresponsive for sleep_after_fork * 16. It sounds like this problem is a bit more serious and complicated than I thought. I really hoped it was a config error on our part. :-( Here is the full memory report as the process VSZ crossed over to 19,001 MB, which shows a total of 9,317,346 KB in use: Current memory usage: Pool Obj Size Allocated In Use Idle Allocations Saved Hit Rate (bytes) (#) (KB) high (KB) high (hrs) impact (%total) (#) (KB) high (KB) portion (%alloc) (#) (KB) high (KB) (number) (%num) (%vol) (%num) (number) 2K Buffer 2048 96 192 248 0.52 0 91 182 248 95 5 10 94 34958835 4.42 18.12 99.96 34972340 4K Buffer 4096 199 796 3904 11.98 0 194 776 3904 97 5 20 1720 7225790 0.91 7.49 98.41 7342427 8K Buffer 8192 21 168 272 12.05 0 17 136 272 81 4 32 176 1880965 0.24 3.90 98.40 1911548 16K Buffer 16384 3 48 64 13.39 0 0 0 64 0 3 48 64 3600 0.00 0.01 98.41 3658 32K Buffer 32768 2 64 128 14.56 0 0 0 128 0 2 64 128 369 0.00 0.00 98.14 376 64K Buffer 65536 2 128 192 12.81 0 0 0 192 0 2 128 128 88 0.00 0.00 95.65 92 Store Client Buffer 4096 22 88 1052 12.05 0 11 44 1052 50 11 44 812 30568572 3.87 31.69 99.74 30646811 acl 64 11 1 1 15.35 0 11 1 1 100 0 0 1 11 0.00 0.00 50.00 22 acl_ip_data 24 7 1 1 15.35 0 7 1 1 100 0 0 1 7 0.00 0.00 50.00 14 acl_list 24 18 1 1 15.35 0 18 1 1 100 0 0 1 18 0.00 0.00 50.00 36 dlink_node 24 337 8 8 0.52 0 0 0 8 0 337 8 8 1088 0.00 0.00 76.35 1425 FwdServer 24 41 1 6 12.05 0 9 1 6 22 32 1 5 3875752 0.49 0.02 99.87 3880603 HttpReply 168 338927 55606 55606 0.00 1 338922 55605 55606 100 5 1 23 10817354 1.37 0.46 96.17 11248509 HttpHeaderEntry 40 2602377 101656 101656 0.00 1 2602282 101652 101656 100 95 4 101 200511069 25.36 2.03 98.06 204487269 HttpHdrCc 40 27921 1091 1091 0.01 0 27914 1091 1091 100 7 1 3 7294400 0.92 0.07 99.17 7355123 HttpHdrRangeSpec 16 96 2 2 13.43 0 0 0 2 0 96 2 2 25596 0.00 0.00 99.31 25775 HttpHdrRange 16 2 1 1 12.07 0 0 0 1 0 2 1 1 21759 0.00 0.00 99.44 21881 HttpHdrContRange 24 5 1 1 13.46 0 0 0 1 0 5 1 1 10713 0.00 0.00 97.21 11021 intlist 16 3 1 1 15.35 0 3 1 1 100 0 0 1 3 0.00 0.00 50.00 6 MemObject 272 338920 90026 90026 0.00 1 338915 90025 90026 100 5 2 39 5022931 0.64 0.35 93.21 5388889 mem_node 4112 2338883 9392078 9392078 0.00 96 2338871 9392029 9392078 100 12 49 3908 15349368 1.94 15.98 81.20 18902877 relist 48 8 1 1 15.35 0 8 1 1 100 0 0 1 8 0.00 0.00 50.00 16 request_t 1360 341 453 893 12.05 0 326 433 893 96 15 20 496 9456631 1.20 3.26 98.60 9590792 StoreEntry 88 338920 29126 29126 0.00 0 338915 29126 29126 100 5 1 13 5025544 0.64 0.11 93.26 5388889 wordlist 16 14 1 1 15.35 0 11 1 1 79 3 1 1 88 0.00 0.00 86.27 102 ClientInfo 344 28096 9439 14657 12.40 0 15714 5279 14657 56 12382 4160 4852 653528 0.08 0.06 93.63 697962 MD5 digest 16 338920 5296 5296 0.00 0 338915 5296 5296 100 5 1 3 13112283 1.66 0.05 97.40 13462778 helper_request 40 353 14 14 0.52 0 1 1 14 0 352 14 14 4805587 0.61 0.05 99.99 4805971 storeSwapTLV 24 5 1 1 15.32 0 0 0 1 0 5 1 1 47627493 6.02 0.29 100.00 47627498 Short Strings 36 2939320 103336 103336 0.00 1 2939241 103333 103336 100 79 3 75 219162796 27.72 2.00 98.15 223283726 Medium Strings 128 15166 1896 1951 0.52 0 15126 1891 1951 100 40 5 89 50029245 6.33 1.62 99.39 50334410 Long Strings 512 5927 2964 3002 0.52 0 5911 2956 3002 100 16 8 143 9251916 1.17 1.20 99.22 9325056 cbdata acl_access (1) 56 13 1 1 15.35 0 13 1 1 100 0 0 1 13 0.00 0.00 50.00 26 cbdata aclCheck_t (2) 312 daf 1 1 1 15.32 0 0 0 1 0 1 1 1 22575783 2.86 1.78 100.00 22575832 cbdata clientHttpRequest (3) 1096 21 23 481 0.52 0 15 17 481 71 6 7 387 4735510 0.60 1.31 98.21 4821935 cbdata ConnStateData (4) 320 194 61 299 11.98 0 190 60 299 98 4 2 137 2604058 0.33 0.21 97.41 2673373 cbdata ErrorState (5) 160 200 32 32 0.06 0 199 32 32 100 1 1 1 16266 0.00 0.00 98.79 16466 cbdata FwdState (6) 96 31 3 24 12.05 0 9 1 24 29 22 3 20 3871284 0.49 0.09 99.76 3880603 cbdata generic_cbdata (7) 32 30 1 4 15.32 0 0 0 4 0 30 1 4 14750620 1.87 0.12 100.00 14750938 cbdata helper (8) 136 1 1 1 15.35 0 1 1 1 100 0 0 0 0 0.00 0.00 0.00 1 cbdata helper_server (9) 152 16 3 3 15.35 0 16 3 3 100 0 0 3 32 0.00 0.00 66.67 48 cbdata HttpStateData (12) 152 38 6 37 12.05 0 9 2 37 24 29 5 32 3874204 0.49 0.15 99.85 3880194 cbdata ps_state (14) 200 1 1 1 15.32 0 0 0 1 0 1 1 1 3880602 0.49 0.20 100.00 3880603 cbdata RemovalPolicy (15) 104 1 1 1 15.35 0 1 1 1 100 0 0 0 0 0.00 0.00 0.00 1 cbdata store_client (18) 128 167 21 45 12.05 0 133 17 45 80 34 5 29 8643422 1.09 0.28 99.89 8652735 event 48 34 2 3 12.38 0 31 2 3 91 3 1 1 8007262 1.01 0.10 99.96 8010458 cbdata http_port_list (19) 128 2 1 1 14.75 0 2 1 1 100 0 0 0 0 0.00 0.00 0.00 2 cbdata body_size (20) 64 1 1 1 15.35 0 1 1 1 100 0 0 1 1 0.00 0.00 50.00 2 cbdata ConnectStateData (21) 96 16 2 11 15.32 0 0 0 11 0 16 2 11 3826157 0.48 0.09 99.96 3827561 close_handler 24 236 6 31 12.05 0 224 6 31 95 12 1 13 14200692 1.80 0.09 99.57 14261946 ipcache_entry 128 978 123 137 13.32 0 951 119 137 97 27 4 22 611237 0.08 0.02 98.86 618298 fqdncache_entry 160 20 4 4 15.35 0 20 4 4 100 0 0 4 24 0.00 0.00 54.55 44 cbdata idns_query (22) 8680 2 17 941 15.32 0 0 0 941 0 2 17 941 616884 0.08 1.36 99.78 618218 cbdata redirectStateData (23) 72 353 25 25 0.52 0 1 1 25 0 352 25 25 4805595 0.61 0.09 99.99 4805971 cbdata Logfile (26) 1120 3 4 4 15.35 0 3 4 4 100 0 0 4 3 0.00 0.00 50.00 6 pconn_data 32 6 1 1 13.34 0 1 1 1 17 5 1 1 61876 0.01 0.00 99.69 62069 pconn_fds 32 6 1 1 13.34 0 1 1 1 17 5 1 1 61889 0.01 0.00 99.71 62069 cbdata AddVaryState (29) 160 1 1 2 12.38 0 0 0 2 0 1 1 2 305954 0.04 0.01 99.84 306435 cbdata LocateVaryState (30) 136 3 1 1 12.38 0 0 0 1 0 3 1 1 571615 0.07 0.02 99.99 571661 VaryData 32 6 1 1 13.62 0 1 1 1 17 5 1 1 570046 0.07 0.00 99.72 571661 cbdata StoreUpdateState (31) 4152 2 9 41 12.34 0 0 0 41 0 2 9 41 1044654 0.13 1.10 99.71 1047729 Total - 9317346 9794810 9794810 0.00 100 9303255 9790117 9790185 100 14091 4693 5120 776329091 98.19 95.79 98.19 790614788 Cumulative allocated volume: 395.07 GB Current overhead: 301304 bytes (0.003%) Idle pool limit: 5.00 MB memPoolAlloc calls: 790614788 memPoolFree calls: 781311532 String Pool Impact (%strings) (%volume) Short Strings 99 96 Medium Strings 1 2 Long Strings 0 3 Other Strings 0 0 Large buffers: 0 (0 KB) Are we 100% certain this is normal / correct? Cutting the process size in half would double the process spawn rate. :) Thanks!