On 9/11/2009 12:43 PM, Noriko Hosoi wrote: > On 09/10/2009 07:46 PM, Kevin Bowling wrote: >> Hi, >> >> I have been running FDS/389 on a F11 xen DomU for several months. I >> use it as the backend for UNIX username/passwords and also for >> redMine (a Ruby on Rails bug tracker) for http://www.gnucapplus.org/. >> >> This VM would regularly lock up every week or so when 389 was still >> called FDS. I've since upgraded to 389 by issuing 'yum upgrade' as >> well as running the 'setup-...-.pl -u' script and now it barely goes >> a day before crashing. When ldap crashes, the whole box basically >> becomes unresponsive. >> >> I left the Xen hardware console open to see what was up and the only >> thing I could conclude was that 389 was crashing (if I issued a >> service start it came back to life). Doing anything like a top or ls >> will completely kill the box. Likewise, the logs show nothing at or >> before the time of crash. I suspected too few file descriptors but >> changing that to a very high number had no impact. >> >> I was about to do a rip and replace with OpenLDAP which I use very >> sucesessfully for our corporate systems but figured I ought to see if >> anyone here can help or if I can submit any kind of meaningful bug >> report first. I assume I will need to run 389's slapd without >> daemonizing it and hope it spits something useful out to stderr. Any >> advice here would be greatly appreciated, as would any success >> stories of using 389 on F11. > Hello Kevin, > > You specified the platform "F11 xen DomU". Did you have a chance to > run the 389 server on any other platforms? I'm wondering if the crash > is observed only on the specific platform or not. Is the server > running on the 64-bit machine or 32-bit? > > If you start the server with "-d 1" option, the server will run as the > trace mode. (E.g., /usr/lib[64]/dirsrv/slapd-YOURID/start-slapd -d 1) > > I'm afraid it might be a memory leak. When you restart the 389 > server, could you check the size of ns-slapd some time like every hour > and see if the server size keeps growing or stops? Also, the server > quits if it fails to write to the errors log. If it happens, it's > logged in the system log. Does the messages file on the system > happen to have some logs related to the 389 server? > > Thanks, > --noriko >> >> I'm not subscribed to the list so please CC. >> >> Regards, >> >> Kevin Bowing >> >> -- >> 389 users mailing list >> 389-users at redhat.com >> https://www.redhat.com/mailman/listinfo/fedora-directory-users > I captured some output while running in trace, see the end of this message. The system is 64-bit, I have not run on any other boxes. A cursory look with top showed only 10MB or so RSS memory. Regards, Kevin [11/Sep/2009:09:58:44 -0700] - => id2entry( 48 ) [11/Sep/2009:09:58:44 -0700] - <= id2entry 7f025401f5a0 (cache) [11/Sep/2009:09:58:44 -0700] - => id2entry( 50 ) [11/Sep/2009:09:58:44 -0700] - <= id2entry 7f0254021190 (cache) [11/Sep/2009:09:58:44 -0700] - => slapi_reslimit_get_integer_limit() conn=0xa856beb0, handle=3 [11/Sep/2009:09:58:44 -0700] - <= slapi_reslimit_get_integer_limit() returning NO VALUE [11/Sep/2009:09:58:44 -0700] - => slapi_reslimit_get_integer_limit() conn=0xa856bc60, handle=3 [11/Sep/2009:09:58:44 -0700] - <= slapi_reslimit_get_integer_limit() returning NO VALUE [11/Sep/2009:09:58:44 -0700] - => slapi_reslimit_get_integer_limit() conn=0xa856bd88, handle=3 [11/Sep/2009:09:58:44 -0700] - <= slapi_reslimit_get_integer_limit() returning NO VALUE [11/Sep/2009:09:58:44 -0700] - => slapi_reslimit_get_integer_limit() conn=0xa856bb38, handle=3 [11/Sep/2009:09:58:44 -0700] - <= slapi_reslimit_get_integer_limit() returning NO VALUE [11/Sep/2009:09:58:44 -0700] - => slapi_reslimit_get_integer_limit() conn=0xa856ba10, handle=3 [11/Sep/2009:09:58:44 -0700] - <= slapi_reslimit_get_integer_limit() returning NO VALUE [11/Sep/2009:09:58:44 -0700] - => send_ldap_result 0:: [11/Sep/2009:09:58:44 -0700] - <= send_ldap_result [11/Sep/2009:09:58:50 -0700] - ldbm backend flushing [11/Sep/2009:09:58:50 -0700] - ldbm backend done flushing [11/Sep/2009:09:58:50 -0700] - ldbm backend flushing [11/Sep/2009:09:58:50 -0700] - ldbm backend done flushing [11/Sep/2009:09:59:20 -0700] - ldbm backend flushing [11/Sep/2009:09:59:20 -0700] - ldbm backend done flushing [11/Sep/2009:09:59:20 -0700] - ldbm backend flushing [11/Sep/2009:09:59:20 -0700] - ldbm backend done flushing [11/Sep/2009:09:59:50 -0700] - ldbm backend flushing [11/Sep/2009:09:59:50 -0700] - ldbm backend done flushing [11/Sep/2009:09:59:50 -0700] - ldbm backend flushing [11/Sep/2009:09:59:50 -0700] - ldbm backend done flushing [11/Sep/2009:10:00:20 -0700] - ldbm backend flushing [11/Sep/2009:10:00:20 -0700] - ldbm backend done flushing [11/Sep/2009:10:00:20 -0700] - ldbm backend flushing [11/Sep/2009:10:00:20 -0700] - ldbm backend done flushing [11/Sep/2009:10:00:50 -0700] - ldbm backend flushing [11/Sep/2009:10:01:03 -0700] - ldbm backend done flushing [11/Sep/2009:10:01:03 -0700] - ldbm backend flushing [11/Sep/2009:10:01:04 -0700] - ldbm backend done flushing [11/Sep/2009:10:01:35 -0700] - ldbm backend flushing [11/Sep/2009:10:01:39 -0700] - ldbm backend done flushing [11/Sep/2009:10:01:39 -0700] - ldbm backend flushing [11/Sep/2009:10:01:39 -0700] - ldbm backend done flushing [11/Sep/2009:10:01:39 -0700] - ldbm backend flushing [11/Sep/2009:10:01:39 -0700] - ldbm backend done flushing [11/Sep/2009:10:01:39 -0700] - ldbm backend flushing [11/Sep/2009:10:01:39 -0700] - ldbm backend done flushing [11/Sep/2009:10:01:39 -0700] - slapd shutting down - signaling operation threads [11/Sep/2009:10:01:40 -0700] - slapd shutting down - waiting for 30 threads to terminate [11/Sep/2009:10:01:40 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:40 -0700] - slapd shutting down - waiting for 29 threads to terminate [11/Sep/2009:10:01:40 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:40 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:40 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:40 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:40 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:40 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:40 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:40 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:40 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:40 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - op_thread received shutdown signal [11/Sep/2009:10:01:41 -0700] - slapd shutting down - waiting for 28 threads to terminate [11/Sep/2009:10:01:41 -0700] - slapd shutting down - closing down internal subsystems and plugins [11/Sep/2009:10:01:41 -0700] - slapd shutting down - waiting for backends to close down [11/Sep/2009:10:01:42 -0700] - => slapi_control_present (looking for 1.3.6.1.4.1.42.2.27.8.5.1) [11/Sep/2009:10:01:42 -0700] - <= slapi_control_present 0 (NO CONTROLS) [11/Sep/2009:10:01:42 -0700] - modify_update_last_modified_attr [11/Sep/2009:10:01:42 -0700] - Calling plugin 'Distributed Numeric Assignment internal preop plugin' #0 type 421 [11/Sep/2009:10:01:42 -0700] dna-plugin - --> dna_pre_op [11/Sep/2009:10:01:42 -0700] dna-plugin - <-- dna_pre_op [11/Sep/2009:10:01:42 -0700] - Calling plugin 'Legacy replication internal preoperation plugin' #1 type 421 [11/Sep/2009:10:01:42 -0700] - Calling plugin 'Multimaster replication internal preoperation plugin' #2 type 421 [11/Sep/2009:10:01:42 -0700] - => entry_apply_mods [11/Sep/2009:10:01:42 -0700] - <= entry_apply_mods 0 [11/Sep/2009:10:01:42 -0700] - => send_ldap_result 0:: [11/Sep/2009:10:01:42 -0700] - <= send_ldap_result [11/Sep/2009:10:01:42 -0700] - ps_service_persistent_searches: entry "cn=uniqueid generator,cn=config" not enqueued on any persistent search lists [11/Sep/2009:10:01:42 -0700] - Calling plugin 'Class of Service internalpostoperation plugin' #0 type 521 [11/Sep/2009:10:01:42 -0700] - --> cos_post_op [11/Sep/2009:10:01:42 -0700] - --> cos_cache_change_notify [11/Sep/2009:10:01:42 -0700] - --> cos_cache_template_index_bsearch [11/Sep/2009:10:01:42 -0700] - --> cos_cache_getref [11/Sep/2009:10:01:42 -0700] - <-- cos_cache_getref [11/Sep/2009:10:01:42 -0700] - <-- cos_cache_template_index_bsearch [11/Sep/2009:10:01:42 -0700] - <-- cos_cache_change_notify [11/Sep/2009:10:01:42 -0700] - <-- cos_post_op [11/Sep/2009:10:01:42 -0700] - Calling plugin 'Legacy replication internal postoperation plugin' #1 type 521 [11/Sep/2009:10:01:43 -0700] - Calling plugin 'Multimaster replication internal postoperation plugin' #2 type 521 [11/Sep/2009:10:01:43 -0700] - Calling plugin 'Retrocl internal postoperation plugin' #3 type 521 not applying change if not logging [11/Sep/2009:10:01:43 -0700] - Calling plugin 'Roles internalpostoperation plugin' #4 type 521 [11/Sep/2009:10:01:43 -0700] - Calling plugin 'Legacy Replication Plugin' #0 type 210 [11/Sep/2009:10:01:43 -0700] - Calling plugin 'Roles Plugin' #0 type 210 [11/Sep/2009:10:01:43 -0700] - Calling plugin 'Multimaster Replication Plugin' #0 type 210 [11/Sep/2009:10:01:43 -0700] - Calling plugin 'HTTP Client' #0 type 210 [11/Sep/2009:10:01:43 -0700] - Calling plugin 'Class of Service' #0 type 210 [11/Sep/2009:10:01:43 -0700] - --> cos_close [11/Sep/2009:10:01:43 -0700] - --> cos_cache_stop [11/Sep/2009:10:01:43 -0700] - <-- cos_cache_wait_on_change thread exit [11/Sep/2009:10:01:43 -0700] - --> cos_cache_release [11/Sep/2009:10:01:43 -0700] - <-- cos_cache_release [11/Sep/2009:10:01:43 -0700] - <-- cos_cache_stop [11/Sep/2009:10:01:43 -0700] - <-- cos_close [11/Sep/2009:10:01:43 -0700] - Calling plugin 'ACL Plugin' #0 type 210 [11/Sep/2009:10:01:43 -0700] - Calling plugin 'Views' #0 type 210 [11/Sep/2009:10:01:43 -0700] views-plugin - --> views_close [11/Sep/2009:10:01:43 -0700] views-plugin - --> views_cache_free [11/Sep/2009:10:01:43 -0700] views-plugin - <-- views_cache_free [11/Sep/2009:10:01:43 -0700] views-plugin - <-- views_close [11/Sep/2009:10:01:43 -0700] - Calling plugin 'State Change Plugin' #0 type 210 [11/Sep/2009:10:01:43 -0700] statechange-plugin - --> statechange_close [11/Sep/2009:10:01:43 -0700] statechange-plugin - <-- statechange_close [11/Sep/2009:10:01:43 -0700] - Calling plugin 'ldbm database' #0 type 210 [11/Sep/2009:10:01:43 -0700] - ldbm backend syncing [11/Sep/2009:10:01:43 -0700] - Waiting for 4 database threads to stop [11/Sep/2009:10:01:43 -0700] - Leaving deadlock_threadmain [11/Sep/2009:10:01:44 -0700] - Leaving checkpoint_threadmain before checkpoint [11/Sep/2009:10:01:44 -0700] - Checkpointing database ... [11/Sep/2009:10:01:44 -0700] - Leaving checkpoint_threadmain [11/Sep/2009:10:01:44 -0700] - Leaving trickle_threadmain priv [11/Sep/2009:10:01:44 -0700] - Leaving perf_threadmain [11/Sep/2009:10:01:45 -0700] - All database threads now stopped [11/Sep/2009:10:01:45 -0700] - ldbm backend done syncing [11/Sep/2009:10:01:45 -0700] - Calling plugin 'chaining database' #0 type 210 [11/Sep/2009:10:01:45 -0700] - Removed [1] entries from the dse tree. [11/Sep/2009:10:01:45 -0700] - Removed [166] entries from the dse tree. [11/Sep/2009:10:01:45 -0700] - ldbm backend cleaning up [11/Sep/2009:10:01:45 -0700] - ldbm backend cleaning up [11/Sep/2009:10:01:45 -0700] - slapd shutting down - backends closed down [11/Sep/2009:10:01:45 -0700] - => reslimit_update_from_entry() conn=0xa856ba10, entry=0x0 [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 0 (based on nsLookThroughLimit) [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 1 (based on nsSizeLimit) [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 2 (based on nsTimeLimit) [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 3 (based on nsIdleTimeout) [11/Sep/2009:10:01:45 -0700] - <= reslimit_update_from_entry() returning status 0 [11/Sep/2009:10:01:45 -0700] - => reslimit_update_from_entry() conn=0xa856bb38, entry=0x0 [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 0 (based on nsLookThroughLimit) [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 1 (based on nsSizeLimit) [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 2 (based on nsTimeLimit) [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 3 (based on nsIdleTimeout) [11/Sep/2009:10:01:45 -0700] - <= reslimit_update_from_entry() returning status 0 [11/Sep/2009:10:01:45 -0700] - => reslimit_update_from_entry() conn=0xa856bd88, entry=0x0 [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 0 (based on nsLookThroughLimit) [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 1 (based on nsSizeLimit) [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 2 (based on nsTimeLimit) [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 3 (based on nsIdleTimeout) [11/Sep/2009:10:01:45 -0700] - <= reslimit_update_from_entry() returning status 0 [11/Sep/2009:10:01:45 -0700] - => reslimit_update_from_entry() conn=0xa856beb0, entry=0x0 [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 0 (based on nsLookThroughLimit) [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 1 (based on nsSizeLimit) [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 2 (based on nsTimeLimit) [11/Sep/2009:10:01:45 -0700] - reslimit_update_from_entry(): setting limit for handle 3 (based on nsIdleTimeout) [11/Sep/2009:10:01:45 -0700] - <= reslimit_update_from_entry() returning status 0 [11/Sep/2009:10:01:45 -0700] - slapd stopped.