Tamas Bagyal wrote: > Rich Megginson wrote: >> Tamas Bagyal wrote: >>> Rich Megginson wrote: >>>> Bagyal Tamas wrote: >>>>> Rich Megginson wrote: >>>>>> Tamas Bagyal wrote: >>>>>>> hello Ryan, >>>>>>> >>>>>>> you tried this version? i have two fedora-ds 1.0.4 in mmr >>>>>>> configuration. i migrate one of those to 1.1 (builded by your and >>>>>>> Rich's instrutctions). but i have a problem with memory usage of >>>>>>> ns-slapd process. initially mem usage is 18.5% but after 2 hours >>>>>>> this changed to 23.1% and growed until killed by kernel. (i >>>>>>> think...) >>>>>>> >>>>>>> mostly read transactions happen (dns) with a few write (cups). >>>>>>> this is a debian etch, mem size is 512 mbyte (i know this is too >>>>>>> low, but this is a test environment). cache size of slapd is >>>>>>> 67108864. >>>>>> Are you using SSL? Anything interesting in your server error log? >>>>> >>>>> I running the setupssl2.sh but not use any ssl connection. error >>>>> log shows nothing, only the server start. >>>> The reason I ask is that older versions of the NSS crypto/SSL >>>> libraries had a memory leak. NSS 3.11.7 does not have this >>>> problem. But you would only see the problem if you were using SSL >>>> connections. >>> >>> ok. I tried again from begining. fresh install, no ssl, no migration, >>> used the setup-ds-admi.pl and setup the mmr with a fedora-ds 1.0.4. >>> but nothing changed, memory usage growing... >>> All setting is default except the mmr/changelog and access.log is off. >>> >>> errors: >>> >>> Fedora-Directory/1.1.0 B2008.059.1017 >>> tower.fmintra.hu:389 (/opt/dirsrv/etc/dirsrv/slapd-tower) >>> >>> >>> [05/Mar/2008:10:19:20 +0100] - dblayer_instance_start: pagesize: >>> 4096, pages: 128798, procpages: 5983 >>> [05/Mar/2008:10:19:20 +0100] - cache autosizing: import cache: 204800k >>> [05/Mar/2008:10:19:21 +0100] - li_import_cache_autosize: 50, >>> import_pages: 51200, pagesize: 4096 >>> [05/Mar/2008:10:19:21 +0100] - WARNING: Import is running with >>> nsslapd-db-private-import-mem on; No other process is allowed to >>> access the database >>> [05/Mar/2008:10:19:21 +0100] - dblayer_instance_start: pagesize: >>> 4096, pages: 128798, procpages: 5983 >>> [05/Mar/2008:10:19:21 +0100] - cache autosizing: import cache: 204800k >>> [05/Mar/2008:10:19:21 +0100] - li_import_cache_autosize: 50, >>> import_pages: 51200, pagesize: 4096 >>> [05/Mar/2008:10:19:21 +0100] - import userRoot: Beginning import job... >>> [05/Mar/2008:10:19:21 +0100] - import userRoot: Index buffering >>> enabled with bucket size 100 >>> [05/Mar/2008:10:19:21 +0100] - import userRoot: Processing file >>> "/tmp/ldifZHth0D.ldif" >>> [05/Mar/2008:10:19:21 +0100] - import userRoot: Finished scanning >>> file "/tmp/ldifZHth0D.ldif" (9 entries) >>> [05/Mar/2008:10:19:21 +0100] - import userRoot: Workers finished; >>> cleaning up... >>> [05/Mar/2008:10:19:21 +0100] - import userRoot: Workers cleaned up. >>> [05/Mar/2008:10:19:21 +0100] - import userRoot: Cleaning up producer >>> thread... >>> [05/Mar/2008:10:19:21 +0100] - import userRoot: Indexing complete. >>> Post-processing... >>> [05/Mar/2008:10:19:21 +0100] - import userRoot: Flushing caches... >>> [05/Mar/2008:10:19:21 +0100] - import userRoot: Closing files... >>> [05/Mar/2008:10:19:21 +0100] - All database threads now stopped >>> [05/Mar/2008:10:19:21 +0100] - import userRoot: Import complete. >>> Processed 9 entries in 0 seconds. (inf entries/sec) >>> [05/Mar/2008:10:19:22 +0100] - Fedora-Directory/1.1.0 B2008.059.1017 >>> starting up >>> [05/Mar/2008:10:19:22 +0100] - I'm resizing my cache now...cache was >>> 209715200 and is now 8000000 >>> [05/Mar/2008:10:19:22 +0100] - slapd started. Listening on All >>> Interfaces port 389 for LDAP requests >>> [05/Mar/2008:10:22:23 +0100] NSMMReplicationPlugin - changelog >>> program - cl5Open: failed to open changelog >>> [05/Mar/2008:10:22:24 +0100] NSMMReplicationPlugin - changelog >>> program - changelog5_config_add: failed to start changelog >>> [05/Mar/2008:10:26:49 +0100] NSMMReplicationPlugin - agmt="cn=replica >>> to backup" (backup:389): Replica has a different generation ID than >>> the local data. >>> [05/Mar/2008:10:32:00 +0100] NSMMReplicationPlugin - >>> repl_set_mtn_referrals: could not set referrals for replica >>> dc=fmintra,dc=hu: 32 >>> [05/Mar/2008:10:32:00 +0100] NSMMReplicationPlugin - >>> multimaster_be_state_change: replica dc=fmintra,dc=hu is going >>> offline; disabling replication >>> [05/Mar/2008:10:32:00 +0100] - WARNING: Import is running with >>> nsslapd-db-private-import-mem on; No other process is allowed to >>> access the database >>> [05/Mar/2008:10:32:13 +0100] - import userRoot: Workers finished; >>> cleaning up... >>> [05/Mar/2008:10:32:13 +0100] - import userRoot: Workers cleaned up. >>> [05/Mar/2008:10:32:13 +0100] - import userRoot: Indexing complete. >>> Post-processing... >>> [05/Mar/2008:10:32:13 +0100] - import userRoot: Flushing caches... >>> [05/Mar/2008:10:32:13 +0100] - import userRoot: Closing files... >>> [05/Mar/2008:10:32:14 +0100] - import userRoot: Import complete. >>> Processed 12242 entries in 13 seconds. (941.69 entries/sec) >>> [05/Mar/2008:10:32:14 +0100] NSMMReplicationPlugin - >>> multimaster_be_state_change: replica dc=fmintra,dc=hu is coming >>> online; enabling replication >>> >>> memory usage by top: >>> >>> top - 10:58:21 up 25 days, 22:36, 2 users, load average: 0.01, >>> 0.13, 0.22 >>> Tasks: 61 total, 2 running, 59 sleeping, 0 stopped, 0 zombie >>> Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, >>> 0.0%si, 0.0%st >>> Mem: 515192k total, 189600k used, 325592k free, 36472k buffers >>> Swap: 489848k total, 18292k used, 471556k free, 106188k cached >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 27647 fds 15 0 464m 47m 25m S 0.0 9.4 1:34.57 ns-slapd >>> >>> >>> top - 11:23:12 up 25 days, 23:01, 2 users, load average: 0.36, >>> 0.27, 0.20 >>> Tasks: 61 total, 2 running, 59 sleeping, 0 stopped, 0 zombie >>> Cpu(s): 3.0%us, 0.0%sy, 0.0%ni, 96.0%id, 1.0%wa, 0.0%hi, >>> 0.0%si, 0.0%st >>> Mem: 515192k total, 210700k used, 304492k free, 36488k buffers >>> Swap: 489848k total, 18288k used, 471560k free, 117204k cached >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 27647 fds 15 0 473m 59m 28m S 3.0 11.9 2:52.77 ns-slapd >>> >>> >>> top - 11:48:26 up 25 days, 23:26, 2 users, load average: 0.02, >>> 0.08, 0.10 >>> Tasks: 61 total, 1 running, 60 sleeping, 0 stopped, 0 zombie >>> Cpu(s): 3.0%us, 0.0%sy, 0.0%ni, 97.0%id, 0.0%wa, 0.0%hi, >>> 0.0%si, 0.0%st >>> Mem: 515192k total, 222756k used, 292436k free, 36520k buffers >>> Swap: 489848k total, 18288k used, 471560k free, 118932k cached >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 27647 fds 15 0 483m 72m 30m S 0.0 14.4 4:12.04 ns-slapd >>> >>> >>> top - 13:31:42 up 26 days, 1:09, 2 users, load average: 0.28, >>> 0.17, 0.15 >>> Tasks: 61 total, 2 running, 59 sleeping, 0 stopped, 0 zombie >>> Cpu(s): 1.1%us, 0.0%sy, 0.0%ni, 98.9%id, 0.0%wa, 0.0%hi, >>> 0.0%si, 0.0%st >>> Mem: 515192k total, 285572k used, 229620k free, 36540k buffers >>> Swap: 489848k total, 18288k used, 471560k free, 140412k cached >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 27647 fds 15 0 523m 116m 34m S 0.0 23.3 9:35.65 ns-slapd > >> Can you post your dse.ldif to pastebin.com? Be sure to omit or >> obscure any sensitive data first. I'd like to see what all of your >> cache settings are. Normally the server will increase in memory usage >> until the caches are full, then memory usage should level off. The >> speed at which this occurs depends on usage. >> > http://www.pastebin.org/22477 > > i forget a thing. i use some custom schema (ldapdns, ibm... etc.) if > this is changed anything. (but i think this is not relevant info) > >> When the kernel kills your server, how much memory is it using? Is >> there anything in the server error log at around the time the kernel >> kills it? >> > i'm not sure, but at the time use the maximum as possible (512ram + 512 > swap available) i think around 940mb, the kernel first kill some other > processes, like mc, and after these the ns-slapd. I can't see anything > in the log file, just the server start. > >> Finally, if you are convinced that there is a real memory leak in the >> server, would it be possible for you to run it under valgrind? Just >> running it under valgrind for 30 minutes or so should reveal any >> memory leaks in normal usage. > > http://www.pastebin.org/22484 > > I can't understand this output, I never used valgrind before. I hope > used the right options for valgrind. > can you tell me what mean the valgrind's output? thanks, KEeF