Christopher Wood wrote: > I'm just getting started with 389 Directory Server (at work), and I've run into an issue that I'm not certain how to troubleshoot. I would greatly appreciate any assistance or tips you could offer, especially on where to look to see what's failing. > > Also, I apologize in advance for changing strings related to my employer's directory names and such, as I'm not comfortable with leaking that level information to a public list. > As well you should be - you should always obscure sensitive information like this. > > Overview: > > Initializing a large subtree from NDS 6.2 crashes ns-slapd, but other subtrees are fine. > > > Top-Level Questions: > > 1) How do I stop ns-slapd from crashing? > Good question. > 2) How do I figure out what precisely is causing the crash? (With various levels of debug logging I get the same log entry.) > You've already used the TRACE level (1) for logging - that's as verbose as it gets for this particular operation. Next step would be to try to get a core file. > 3) Is it possible to simply import my initialization ldif without duplication checks? > No. > > Background: > > At work we have NDS 6.2 (single master on a physical server, virtual machine slaves), and would like to move our directories intact to a 389 2.6 installation via replication. > What platform/OS? 32-bit or 64-bit? By NDS 6.2 I'm assuming you mean Netscape Directory Server - by 2.6 I'm assuming you mean 1.2.6.a1 (a2 should be hitting the mirrors tomorrow). > I already have replicated several of our NDS 6.2 subtrees to 389 2.6 with no difficulties. > > I compiled our 389 installation from the source packages downloaded from http://directory.fedoraproject.org/wiki/Source. Did you grab 389-ds-base 1.2.6.a1 or 1.2.6.a2? What compiler flags did you use? Do you have a core file? If so, try using gdb gdb /path/to/ns-slapd /path/to/core.pid once in gdb, type the "where" command (gdb) where > The underlying platform is: > > $ uname -a > Linux cwlab-02.mycompany.com 2.6.18-164.el5 #1 SMP Thu Sep 3 03:33:56 EDT 2009 i686 i686 i386 GNU/Linux > $ cat /etc/redhat-release > CentOS release 5.4 (Final) > > $ free > total used free shared buffers cached > Mem: 3894000 1336012 2557988 0 144944 1004716 > -/+ buffers/cache: 186352 3707648 > Swap: 2031608 0 2031608 > > > Procedure To Crash 389's ns-slapd: > > a) In the NDS 6.2 admin console, create a new replication agreement for the "o=This Big Net" subtree, and choose to "Create consumer initialization file". > > b) Copy the file to the 389 server. > > c) In the 389 2.6 admin console for the Directory Server, in the Configuration tab (Data -> o=This Big Net -> dbRoot), right-click and choose "Initialize Database". Use the ldif file copied over. > > The ns-slapd process crashes, and I always get this in /opt/dirsrv/var/log/dirsrv/slapd-cwlab-02/errors as the last two lines: > > [03/Mar/2010:12:50:04 -0500] - import ldapAuthRoot: Processing file "/home/cwood/tbn.ldif" > [03/Mar/2010:12:50:04 -0500] - => str2entry_dupcheck > > > Other Details: > > > I found two bugs with the str2entry_dupcheck string in it, but they don't seem pertinent: > > https://bugzilla.redhat.com/show_bug.cgi?id=548115 > https://bugzilla.redhat.com/show_bug.cgi?id=243488 > > > This says that str2entry_dupcheck could be about two things: > > http://docs.sun.com/source/816-6699-10/ax_errcd.html > > "While attempting to convert a string entry to an LDAP entry, the server found that the entry has no DN." > > "The server failed to add a value to the value tree." > > (But this is an exported database from NDS 6.2, and I'm fairly sure, without reading them all, that every entry will have a DN.) > The log message [03/Mar/2010:12:50:04 -0500] - => str2entry_dupcheck is just trace information, not a report of a problem or error. Does the crash happen almost immediately? Or does it take a while? If the problem happens quickly, it would be worthwhile to scan the first couple of dozen entries looking for things like - entries without a DN - attributes without a value > > If 389 is trying to check for duplicate entries, perhaps there are simply too many DNs? > > $ grep '^dn:' tbn.ldif | wc -l > 636985 > $ ls -lh acc.ldif > -rw-r--r-- 1 cwood cwood 755M Mar 3 11:24 tbn.ldif > No. The server should be able to handle this much data easily. And it must check for duplicate entries. > > Per the instructions here: > > http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting > > I set my debug logging first to 24579: > > 1 Trace function calls > 2 Debug packet handling > 8192 Replication debugging > 16384 Critical messages > > Then for the next try at reading logs I set it to 90115, the above plus: > > 65536 Plug-in debugging > > However, every time the log ended with the same set of lines noted above. > 1 Trace is really the best for this particular problem, and as you have found it is limited for this particular problem. I think the next step would be to build the server with full debugging information (use -g and omit -O2 or any other -Ox) and get a stack trace with full debug information. > -- > 389 users mailing list > 389-users@xxxxxxxxxxxxxxxxxxxxxxx > https://admin.fedoraproject.org/mailman/listinfo/389-users > -- 389 users mailing list 389-users@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/389-users