Re: notes on building fds in etch and a failed build question

Tamas Bagyal <bagyi@xxxxxxxxxxxxxxxx> · Mon, 10 Mar 2008 18:17:01 +0100

Rich Megginson wrote:
Tamas Bagyal wrote:
Tamas Bagyal wrote:
Rich Megginson wrote:
Tamas Bagyal wrote:
Rich Megginson wrote:
Bagyal Tamas wrote:
Rich Megginson wrote:
Tamas Bagyal wrote:
hello Ryan,

you tried this version? i have two fedora-ds 1.0.4 in mmr 
configuration. i migrate one of those to 1.1 (builded by your 
and Rich's instrutctions). but i have a problem with memory 
usage of ns-slapd process. initially mem usage is 18.5% but 
after 2 hours this changed to 23.1% and growed until killed by 
kernel. (i think...)

mostly read transactions happen (dns) with a few write (cups).
this is a debian etch, mem size is 512 mbyte (i know this is 
too low, but this is a test environment). cache size of slapd 
is 67108864.
Are you using SSL?  Anything interesting in your server error log?

I running the setupssl2.sh but not use any ssl connection. error 
log shows nothing, only the server start.
The reason I ask is that older versions of the NSS crypto/SSL 
libraries had a memory leak.  NSS 3.11.7 does not have this 
problem.  But you would only see the problem if you were using SSL 
connections.

ok. I tried again from begining. fresh install, no ssl, no 
migration, used the setup-ds-admi.pl and setup the mmr with a 
fedora-ds 1.0.4. but nothing changed, memory usage growing...
All setting is default except the mmr/changelog and access.log is off.

errors:

 Fedora-Directory/1.1.0 B2008.059.1017
        tower.fmintra.hu:389 (/opt/dirsrv/etc/dirsrv/slapd-tower)

[05/Mar/2008:10:19:20 +0100] - dblayer_instance_start: pagesize: 
4096, pages: 128798, procpages: 5983
[05/Mar/2008:10:19:20 +0100] - cache autosizing: import cache: 204800k
[05/Mar/2008:10:19:21 +0100] - li_import_cache_autosize: 50, 
import_pages: 51200, pagesize: 4096
[05/Mar/2008:10:19:21 +0100] - WARNING: Import is running with 
nsslapd-db-private-import-mem on; No other process is allowed to 
access the database
[05/Mar/2008:10:19:21 +0100] - dblayer_instance_start: pagesize: 
4096, pages: 128798, procpages: 5983
[05/Mar/2008:10:19:21 +0100] - cache autosizing: import cache: 204800k
[05/Mar/2008:10:19:21 +0100] - li_import_cache_autosize: 50, 
import_pages: 51200, pagesize: 4096
[05/Mar/2008:10:19:21 +0100] - import userRoot: Beginning import 
job...
[05/Mar/2008:10:19:21 +0100] - import userRoot: Index buffering 
enabled with bucket size 100
[05/Mar/2008:10:19:21 +0100] - import userRoot: Processing file 
"/tmp/ldifZHth0D.ldif"
[05/Mar/2008:10:19:21 +0100] - import userRoot: Finished scanning 
file "/tmp/ldifZHth0D.ldif" (9 entries)
[05/Mar/2008:10:19:21 +0100] - import userRoot: Workers finished; 
cleaning up...
[05/Mar/2008:10:19:21 +0100] - import userRoot: Workers cleaned up.
[05/Mar/2008:10:19:21 +0100] - import userRoot: Cleaning up 
producer thread...
[05/Mar/2008:10:19:21 +0100] - import userRoot: Indexing complete. 
Post-processing...
[05/Mar/2008:10:19:21 +0100] - import userRoot: Flushing caches...
[05/Mar/2008:10:19:21 +0100] - import userRoot: Closing files...
[05/Mar/2008:10:19:21 +0100] - All database threads now stopped
[05/Mar/2008:10:19:21 +0100] - import userRoot: Import complete.  
Processed 9 entries in 0 seconds. (inf entries/sec)
[05/Mar/2008:10:19:22 +0100] - Fedora-Directory/1.1.0 
B2008.059.1017 starting up
[05/Mar/2008:10:19:22 +0100] - I'm resizing my cache now...cache 
was 209715200 and is now 8000000
[05/Mar/2008:10:19:22 +0100] - slapd started.  Listening on All 
Interfaces port 389 for LDAP requests
[05/Mar/2008:10:22:23 +0100] NSMMReplicationPlugin - changelog 
program - cl5Open: failed to open changelog
[05/Mar/2008:10:22:24 +0100] NSMMReplicationPlugin - changelog 
program - changelog5_config_add: failed to start changelog
[05/Mar/2008:10:26:49 +0100] NSMMReplicationPlugin - 
agmt="cn=replica to backup" (backup:389): Replica has a different 
generation ID than the local data.
[05/Mar/2008:10:32:00 +0100] NSMMReplicationPlugin - 
repl_set_mtn_referrals: could not set referrals for replica 
dc=fmintra,dc=hu: 32
[05/Mar/2008:10:32:00 +0100] NSMMReplicationPlugin - 
multimaster_be_state_change: replica dc=fmintra,dc=hu is going 
offline; disabling replication
[05/Mar/2008:10:32:00 +0100] - WARNING: Import is running with 
nsslapd-db-private-import-mem on; No other process is allowed to 
access the database
[05/Mar/2008:10:32:13 +0100] - import userRoot: Workers finished; 
cleaning up...
[05/Mar/2008:10:32:13 +0100] - import userRoot: Workers cleaned up.
[05/Mar/2008:10:32:13 +0100] - import userRoot: Indexing complete. 
Post-processing...
[05/Mar/2008:10:32:13 +0100] - import userRoot: Flushing caches...
[05/Mar/2008:10:32:13 +0100] - import userRoot: Closing files...
[05/Mar/2008:10:32:14 +0100] - import userRoot: Import complete.  
Processed 12242 entries in 13 seconds. (941.69 entries/sec)
[05/Mar/2008:10:32:14 +0100] NSMMReplicationPlugin - 
multimaster_be_state_change: replica dc=fmintra,dc=hu is coming 
online; enabling replication

memory usage by top:

top - 10:58:21 up 25 days, 22:36,  2 users,  load average: 0.01, 
0.13, 0.22
Tasks:  61 total,   2 running,  59 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  
0.0%si,  0.0%st
Mem:    515192k total,   189600k used,   325592k free,    36472k 
buffers
Swap:   489848k total,    18292k used,   471556k free,   106188k 
cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
27647 fds       15   0  464m  47m  25m S  0.0  9.4   1:34.57 ns-slapd

top - 11:23:12 up 25 days, 23:01,  2 users,  load average: 0.36, 
0.27, 0.20
Tasks:  61 total,   2 running,  59 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.0%us,  0.0%sy,  0.0%ni, 96.0%id,  1.0%wa,  0.0%hi,  
0.0%si,  0.0%st
Mem:    515192k total,   210700k used,   304492k free,    36488k 
buffers
Swap:   489848k total,    18288k used,   471560k free,   117204k 
cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
27647 fds       15   0  473m  59m  28m S  3.0 11.9   2:52.77 ns-slapd

top - 11:48:26 up 25 days, 23:26,  2 users,  load average: 0.02, 
0.08, 0.10
Tasks:  61 total,   1 running,  60 sleeping,   0 stopped,   0 zombie
Cpu(s):  3.0%us,  0.0%sy,  0.0%ni, 97.0%id,  0.0%wa,  0.0%hi,  
0.0%si,  0.0%st
Mem:    515192k total,   222756k used,   292436k free,    36520k 
buffers
Swap:   489848k total,    18288k used,   471560k free,   118932k 
cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
27647 fds       15   0  483m  72m  30m S  0.0 14.4   4:12.04 ns-slapd

top - 13:31:42 up 26 days,  1:09,  2 users,  load average: 0.28, 
0.17, 0.15
Tasks:  61 total,   2 running,  59 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.1%us,  0.0%sy,  0.0%ni, 98.9%id,  0.0%wa,  0.0%hi,  
0.0%si,  0.0%st
Mem:    515192k total,   285572k used,   229620k free,    36540k 
buffers
Swap:   489848k total,    18288k used,   471560k free,   140412k 
cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
27647 fds       15   0  523m 116m  34m S  0.0 23.3   9:35.65 ns-slapd

Can you post your dse.ldif to pastebin.com?  Be sure to omit or 
obscure any sensitive data first.  I'd like to see what all of your 
cache settings are.  Normally the server will increase in memory 
usage until the caches are full, then memory usage should level 
off.  The speed at which this occurs depends on usage.

http://www.pastebin.org/22477

i forget a thing. i use some custom schema (ldapdns, ibm... etc.) if 
this is changed anything. (but i think this is not relevant info)

When the kernel kills your server, how much memory is it using?  Is 
there anything in the server error log at around the time the kernel 
kills it?

i'm not sure, but at the time use the maximum as possible (512ram + 
512 swap available) i think around 940mb, the kernel first kill some 
other processes, like mc, and after these the ns-slapd. I can't see 
anything in the log file, just the server start.

Finally, if you are convinced that there is a real memory leak in 
the server, would it be possible for you to run it under valgrind?  
Just running it under valgrind for 30 minutes or so should reveal 
any memory leaks in normal usage.

http://www.pastebin.org/22484

I can't understand this output, I never used valgrind before. I hope 
used the right options for valgrind.

can you tell me what mean the valgrind's output?
I'm not sure.  The output is truncated, and valgrind is producing a lot 
of spurious errors, or at least errors not in directory server code.  I 
guess pastebin is not going to like a several hundred thousand byte 
output file - is there somewhere else you can post the entire output?

sorry, i not verified after the paste.
but i hope you access the output here: http://keef.uw.hu/valgrind-fds-test.28385

KeeF

--
Fedora-directory-users mailing list
Fedora-directory-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-directory-users