Noriko Hosoi wrote: > On 02/05/2010 08:32 AM, Francesco Fiore wrote: >> >> >> Francesco Fiore wrote: >>> >>> >>> Rich Megginson wrote: >>>> Francesco Fiore wrote: >>>> >>>>> Hi, >>>>> I've two directory server in multimaster configuration. I've to >>>>> reinitialize all databases on 2 nd server (B) using the data of the 1st (A). >>>>> After the synchronization, server B crash with an segmentation fault. >>>>> There isn't any relevant message in the error log. >>>>> If I restart the directory server B, I've the same error. >>>>> The directory server version is 1.1.3 on Redhat5. >>>>> >>>>> >>>> rpm -qi fedora-ds-base >>>> >>>> 32-bit or 64-bit? >>>> >>>> We have fixed quite a few replication bugs since 1.1.3, including a >>>> couple of crashes. I recommend upgrading to the latest. >>>> >>> # rpm -qi 389-ds-base >>> Name : 389-ds-base Relocations: (not >>> relocatable) >>> Version : 1.2.4 Vendor: Fedora Project >>> Release : 1.el5 Build Date: Tue 03 Nov >>> 2009 04:47:39 PM CET >>> Install Date: Fri 05 Feb 2010 11:49:11 AM CET Build Host: >>> x86-6.fedora.phx.redhat.com >>> Group : System Environment/Daemons Source RPM: >>> 389-ds-base-1.2.4-1.el5.src.rpm >>> Size : 5339258 License: GPLv2 with >>> exceptions >>> Signature : DSA/SHA1, Fri 06 Nov 2009 05:17:38 PM CET, Key ID >>> 119cc036217521f6 >>> Packager : Fedora Project >>> URL : http://port389.org/ >>> Summary : 389 Directory Server (base) >>> Description : >>> >>> x86-64 >>> >>> I updated to the last stable version but I've the same error. >>> I traced the running process and I discovered that the segmentation >>> fault is probably caused by futex system call. I attach the tail of >>> the output of the strace command below. >>> >>> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN >>> (Transport endpoint is not connected) >>> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN}, >>> {fd=-1}, {fd=65, events=POLLIN}], 5, 250) = 1 ([{fd=65, >>> revents=POLLIN}]) >>> futex(0x145f806c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x145f8068, >>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 >>> futex(0x145d0850, FUTEX_WAKE_PRIVATE, 1) = 1 >>> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN >>> (Transport endpoint is not connected) >>> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN}, >>> {fd=-1}], 4, 250) = 1 ([{fd=42, revents=POLLIN}]) >>> read(42, "\0", 200) = 1 >>> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN >>> (Transport endpoint is not connected) >>> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN}, >>> {fd=-1}, {fd=64, events=POLLIN}], 5, 250) = 1 ([{fd=64, >>> revents=POLLIN}]) >>> futex(0x145f806c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x145f8068, >>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 >>> futex(0x14550730, FUTEX_WAKE_PRIVATE, 1 <unavailable ...> >>> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN >>> (Transport endpoint is not connected) >>> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN}, >>> {fd=-1}, {fd=65, events=POLLIN}], 5, 250) = 1 ([{fd=65, >>> revents=POLLIN}]) >>> futex(0x145f806c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x145f8068, >>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 >>> futex(0x145d0850, FUTEX_WAKE_PRIVATE, 1) = 1 >>> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN >>> (Transport endpoint is not connected) >>> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN}, >>> {fd=-1}], 4, 250) = 1 ([{fd=42, revents=POLLIN}]) >>> read(42, "\0", 200) = 1 >>> getpeername(6, 0x7fff8256e3a0, [1475252821577171056]) = -1 ENOTCONN >>> (Transport endpoint is not connected) >>> poll([{fd=42, events=POLLIN}, {fd=-1}, {fd=6, events=POLLIN}, >>> {fd=-1}, {fd=64, events=POLLIN}], 5, 250) = 1 ([{fd=64, >>> revents=POLLIN}]) >>> futex(0x145f806c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x145f8068, >>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 >>> futex(0x14550730, FUTEX_WAKE_PRIVATE, 1 <unavailable ...> >> >> I debugged the running process and gdb printed this stacktrace after >> the segmentation fault: >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to Thread 0x63b2b940 (LWP 31976)] >> 0x000000364fa79140 in strcmp () from /lib64/libc.so.6 >> (gdb) bt >> #0 0x000000364fa79140 in strcmp () from /lib64/libc.so.6 >> #1 0x00002b188041e4fc in ?? () from >> /usr/lib64/dirsrv/plugins/libback-ldbm.so >> #2 0x00002b188041d8d9 in add_hash () from >> /usr/lib64/dirsrv/plugins/libback-ldbm.so >> #3 0x00002b188041df27 in ?? () from >> /usr/lib64/dirsrv/plugins/libback-ldbm.so >> #4 0x00002b188042c273 in id2entry () from >> /usr/lib64/dirsrv/plugins/libback-ldbm.so >> #5 0x00002b18804594c0 in uniqueid2entry () from >> /usr/lib64/dirsrv/plugins/libback-ldbm.so >> #6 0x00002b188042b961 in ?? () from >> /usr/lib64/dirsrv/plugins/libback-ldbm.so >> #7 0x00002b18804445fc in ldbm_back_delete () from >> /usr/lib64/dirsrv/plugins/libback-ldbm.so >> #8 0x00002b187c4990d4 in ?? () from /usr/lib64/dirsrv/libslapd.so.0 >> #9 0x00002b187c499413 in do_delete () from /usr/lib64/dirsrv/libslapd.so.0 >> #10 0x0000000000412e79 in sasl_map_config_add () >> #11 0x0000003590827fad in ?? () from /usr/lib64/libnspr4.so >> #12 0x00000036506064a7 in start_thread () from /lib64/libpthread.so.0 >> #13 0x000000364fad3c2d in clone () from /lib64/libc.so.6 >> >> I hope that these information can be useful. > The stacktrace is really useful. Thanks! If possible, could you > install the debuginfo package and take the stacktrace? > yum install 389-ds-base-debuginfo Hi, I'm a collegue of Francesco, and i'm too following this problem. We have already installed the 389-ds-base-debuginfo and the stacktrace is: Program received signal SIGSEGV, Segmentation fault. 0x000000364fa79140 in strcmp () from /lib64/libc.so.6 (gdb) bt #0 0x000000364fa79140 in strcmp () from /lib64/libc.so.6 #1 0x00002b39f5cea4fc in entry_same_dn (e=<value optimized out>, k=0x2aaab800e860) at ldap/servers/slapd/back-ldbm/cache.c:137 #2 0x00002b39f5ce98d9 in add_hash (ht=0x191b1900, key=0x2aaab800e860, keylen=<value optimized out>, entry=0x2aaab800ae00, alt=0x64035b68) at ldap/servers/slapd/back-ldbm/cache.c:185 #3 0x00002b39f5ce9f27 in cache_add_int (cache=0x19105718, e=0x2aaab800ae00, state=0, alt=0x64035c18) at ldap/servers/slapd/back-ldbm/cache.c:1037 #4 0x00002b39f5cf8273 in id2entry (be=0x191aef70, id=1505303, txn=0x0, err=0x64035d58) at ldap/servers/slapd/back-ldbm/id2entry.c:268 #5 0x00002b39f5d254c0 in uniqueid2entry (be=0x191aef70, uniqueid=<value optimized out>, txn=0x0, err=0x64035d58) at ldap/servers/slapd/back-ldbm/uniqueid2entry.c:86 #6 0x00002b39f5cf7961 in find_entry_internal (pb=0x2aaab8008200, be=0x191aef70, addr=<value optimized out>, lock=1, txn=0x0, really_internal=0) at ldap/servers/slapd/back-ldbm/findentry.c:201 #7 0x00002b39f5d105fc in ldbm_back_delete (pb=0x2aaab8008200) at ldap/servers/slapd/back-ldbm/ldbm_delete.c:140 #8 0x00002b39f1d810d4 in op_shared_delete (pb=0x2aaab8008200) at ldap/servers/slapd/delete.c:318 #9 0x00002b39f1d81413 in do_delete (pb=0x2aaab8008200) at ldap/servers/slapd/delete.c:116 #10 0x0000000000412e79 in connection_threadmain () at ldap/servers/slapd/connection.c:548 #11 0x0000003590827fad in ?? () from /usr/lib64/libnspr4.so #12 0x00000036506064a7 in start_thread () from /lib64/libpthread.so.0 #13 0x000000364fad3c2d in clone () from /lib64/libc.so.6 Thanks > --noriko >>> >>>>> I attach the tails of the error log and the /var/log/messages log. >>>>> >>>>> [03/Feb/2010:19:20:53 +0100] - import Addressbook2: Workers finished; >>>>> cleaning up... >>>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook1: Workers finished; >>>>> cleaning up... >>>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook2: Workers cleaned up. >>>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook2: Indexing complete. >>>>> Post-processing... >>>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook1: Workers cleaned up. >>>>> [03/Feb/2010:19:21:13 +0100] - import Addressbook1: Indexing complete. >>>>> Post-processing... >>>>> [03/Feb/2010:19:21:50 +0100] - import Addressbook2: Flushing caches... >>>>> [03/Feb/2010:19:22:27 +0100] - import Addressbook1: Flushing caches... >>>>> [03/Feb/2010:19:22:27 +0100] - import Addressbook2: Closing files... >>>>> [03/Feb/2010:19:22:27 +0100] - import Addressbook1: Closing files... >>>>> [03/Feb/2010:19:32:27 +0100] - import Addressbook2: Import complete. >>>>> Processed 3820687 entries in 4957 seconds. (770.77 entries/sec) >>>>> [03/Feb/2010:19:32:28 +0100] NSMMReplicationPlugin - >>>>> multimaster_be_state_change: replica o=addressbook2 is coming online; >>>>> enabling replication >>>>> [03/Feb/2010:19:32:29 +0100] - import Addressbook1: Import complete. >>>>> Processed 3820339 entries in 4960 seconds. (770.23 entries/sec) >>>>> [03/Feb/2010:19:32:29 +0100] NSMMReplicationPlugin - >>>>> multimaster_be_state_change: replica o=addressbook1 is coming online; >>>>> enabling replication >>>>> [03/Feb/2010:19:32:29 +0100] NSMMReplicationPlugin - replica_reload_ruv: >>>>> Warning: new data for replica o=addressbook1 does not match the data in >>>>> the changelog. >>>>> Recreating the changelog file. This could affect replication with >>>>> replica's consumers in which case the consumers should be reinitialized. >>>>> >>>>> Feb 3 19:32:35 mmt-l-al19 kernel: ns-slapd[5575]: segfault at >>>>> 0000000000000000 rip 000000364fa79140 rsp 0000000056bd3b18 error 4 >>>>> >>>>> Have you any idea? >>>>> >>>>> Thanks >>>>> >>>>> >>>>> >>>> >>>> -- >>>> 389 users mailing list >>>> 389-users at lists.fedoraproject.org >>>> https://admin.fedoraproject.org/mailman/listinfo/389-users >>>> >>> >>> -- >>> Francesco Fiore >>> System Integrator >>> Babel S.r.l. - http://www.babel.it >>> P.zza S.Benedetto da Norcia, 33 - 00040 Pomezia (Roma) >>> >>> >>> CONFIDENZIALE: Questo messaggio ed i suoi allegati sono di carattere >>> confidenziale per i destinatari in indirizzo. Se hai ricevuto questo >>> messaggio per errore sei invitato cortesemente a rispondere >>> immediatamente al mittente e cancellare tutti i suoi contenuti. >>> >>> ------------------------------------------------------------------------ >>> >>> -- >>> 389 users mailing list >>> 389-users at lists.fedoraproject.org >>> https://admin.fedoraproject.org/mailman/listinfo/389-users >> Thanks >> -- >> Francesco Fiore >> System Integrator >> Babel S.r.l. - http://www.babel.it >> P.zza S.Benedetto da Norcia, 33 - 00040 Pomezia (Roma) >> >> >> CONFIDENZIALE: Questo messaggio ed i suoi allegati sono di carattere >> confidenziale per i destinatari in indirizzo. Se hai ricevuto questo >> messaggio per errore sei invitato cortesemente a rispondere >> immediatamente al mittente e cancellare tutti i suoi contenuti. >> >> >> >> -- >> 389 users mailing list >> 389-users at lists.fedoraproject.org >> https://admin.fedoraproject.org/mailman/listinfo/389-users > > ------------------------------------------------------------------------ > > -- > 389 users mailing list > 389-users at lists.fedoraproject.org > https://admin.fedoraproject.org/mailman/listinfo/389-users -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.fedoraproject.org/pipermail/389-users/attachments/20100205/e7e32313/attachment.html