Robert Viduya wrote: > I didn't get a response to my previous request for help and our > situation degenerated (we lost 3 of our 4 masters) to the point where > I felt we had to do a clean rebuild. We did that late last week into > the weekend and had set up a 2 masters and assorted hubs and slaves. > We used a clean ldif file to import into the first master, so no > previous replica IDs were carried over from the previous environment. > > We are running directory version 1.2.2 on RHEL5.4, both 64-bit. > > Things were running fine until this morning, when one of our masters > started reporting errors. We found this in it's errorlog: > > [10/Nov/2009:08:56:27 -0500] NSMMReplicationPlugin - > multimaster_be_state_change: replica > ou=people,dc=gted,dc=gatech,dc=edu is going offline; disabling > replication > [10/Nov/2009:08:59:29 -0500] - WARNING: Import is running with > nsslapd-db-private-import-mem on; No other process is allowed to > access the database > [10/Nov/2009:08:59:33 -0500] - ERROR bulk import abandoned > [10/Nov/2009:08:59:34 -0500] - import people: Aborting all import > threads... > [10/Nov/2009:08:59:42 -0500] - import people: Import threads aborted. > [10/Nov/2009:08:59:43 -0500] - import people: Closing files... > [10/Nov/2009:08:59:43 -0500] - import people: Import failed. > [10/Nov/2009:09:01:51 -0500] NSMMReplicationPlugin - > replica_replace_ruv_tombstone: failed to update replication update > vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1 > [10/Nov/2009:09:01:57 -0500] NSMMReplicationPlugin - > replica_replace_ruv_tombstone: failed to update replication update > vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1 > [10/Nov/2009:09:02:01 -0500] NSMMReplicationPlugin - > replica_replace_ruv_tombstone: failed to update replication update > vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1 > [10/Nov/2009:09:02:21 -0500] NSMMReplicationPlugin - > replica_replace_ruv_tombstone: failed to update replication update > vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1 > [10/Nov/2009:09:02:26 -0500] NSMMReplicationPlugin - > replica_replace_ruv_tombstone: failed to update replication update > vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1 > [10/Nov/2009:09:02:32 -0500] NSMMReplicationPlugin - > replica_replace_ruv_tombstone: failed to update replication update > vector for replica ou=people,dc=gted,dc=gatech,dc=edu: LDAP error - 1 > > > That last line repeats until we brought the server down. The log > _looks_ like someone/something triggered an import operation, but > no-one did, on either master. > > The errorlog on the other master shows the following: > > [10/Nov/2009:08:39:29 -0500] - repl5_inc_waitfor_async_results timed > out waiting for responses: 38 46 > [10/Nov/2009:08:39:54 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Warning: unable to receive > endReplication extended operation response (Bad parameter to an ldap > routine) > [10/Nov/2009:08:40:04 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Unable to receive the response for a > startReplication extended operation to consumer (Bad parameter to an > ldap routine). Will retry later. > [10/Nov/2009:08:40:08 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Unable to receive the response for a > startReplication extended operation to consumer (Bad parameter to an > ldap routine). Will retry later. > [10/Nov/2009:08:40:14 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Unable to receive the response for a > startReplication extended operation to consumer (Bad parameter to an > ldap routine). Will retry later. > [10/Nov/2009:08:40:38 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Unable to receive the response for a > startReplication extended operation to consumer (Bad parameter to an > ldap routine). Will retry later. > [10/Nov/2009:08:43:05 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Unable to receive the response for a > startReplication extended operation to consumer (Bad parameter to an > ldap routine). Will retry later. > [10/Nov/2009:08:44:50 -0500] - repl5_inc_waitfor_async_results timed > out waiting for responses: 6 8 > [10/Nov/2009:08:47:08 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Unable to receive the response for a > startReplication extended operation to consumer (Bad parameter to an > ldap routine). Will retry later. > [10/Nov/2009:08:47:08 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Incremental protocol: event > backoff_timer_expired should not occur in state start_backoff > [10/Nov/2009:08:47:12 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Unable to receive the response for a > startReplication extended operation to consumer (Bad parameter to an > ldap routine). Will retry later. > [10/Nov/2009:08:47:18 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Incremental update failed and > requires administrator action > [10/Nov/2009:08:55:01 -0500] - repl5_inc_waitfor_async_results timed > out waiting for responses: 13 14 > [10/Nov/2009:08:55:01 -0500] - repl5_inc_waitfor_async_results timed > out waiting for responses: 59 81 > [10/Nov/2009:08:55:14 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Warning: unable to receive > endReplication extended operation response (Bad parameter to an ldap > routine) > [10/Nov/2009:08:55:24 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Unable to receive the response for a > startReplication extended operation to consumer (Bad parameter to an > ldap routine). Will retry later. > [10/Nov/2009:08:55:28 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Unable to receive the response for a > startReplication extended operation to consumer (Bad parameter to an > ldap routine). Will retry later. > [10/Nov/2009:08:55:34 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Unable to receive the response for a > startReplication extended operation to consumer (Bad parameter to an > ldap routine). Will retry later. > [10/Nov/2009:08:55:46 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Unable to receive the response for a > startReplication extended operation to consumer (Bad parameter to an > ldap routine). Will retry later. > [10/Nov/2009:08:56:10 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Unable to receive the response for a > startReplication extended operation to consumer (Bad parameter to an > ldap routine). Will retry later. > [10/Nov/2009:08:56:58 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Unable to receive the response for a > startReplication extended operation to consumer (Bad parameter to an > ldap routine). Will retry later. > [10/Nov/2009:08:58:34 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Replication bind with SIMPLE auth > resumed > [10/Nov/2009:09:01:47 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Consumer failed to replay change > (uniqueid 51dccc08-9efe11de-8efe8516-22c1043e, CSN > 4af96f8a000200370000): Operations error. Will retry later. > [10/Nov/2009:09:01:47 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Consumer failed to replay change > (uniqueid 5ad5610c-1dd211b2-80b9be51-952a0000, CSN > 4af96f8b000000370000): Operations error. Will retry later. > [10/Nov/2009:09:01:47 -0500] NSMMReplicationPlugin - agmt="cn=people > rewbell gertrude" (gertrude:636): Consumer failed to replay change > (uniqueid 213cd58e-cd7b11de-b535d108-950067b1, CSN > 4af96fcf000000370000): Operations error. Will retry later. > > Again, that last line repeats until we brought down the errant server. > > We've seen this behavior a few times since upgrading. One of our > masters somehow thinks it's supposed to do an import and trashes it's > copy of the data. No person had triggered an import or a > supplier->consumer initialization. Are there conditions where the > directory server itself would trigger such an operation autonomously? No. Check the access log to see what operations were submitted to the directory server at or around [10/Nov/2009:08:56:27 -0500] Are your servers in time sync? Is cn=people rewbell gertrude the agreement that sends updates to the master that is having the spontaneous import problem? > > -- > 389 users mailing list > 389-users at redhat.com > https://www.redhat.com/mailman/listinfo/fedora-directory-users -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3258 bytes Desc: S/MIME Cryptographic Signature Url : http://lists.fedoraproject.org/pipermail/389-users/attachments/20091110/b95cc79a/attachment.bin