[389-users] Corrupted database again, with 389 MMR replication and TCP errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




We are still having this issue, every so often. Sometimes a 389 database becomes unresponsive to queries, and the log shows errors (full log below)

We started a thread back in Nov, but didnt really get a resolution.
https://www.redhat.com/archives/fedora-directory-users/2009-November/msg00056.html

The only way to fix it seems to be to re-import a replication export from a provider. I am new, so I'm not sure whether this is the right place to report this, or should I submit a bug?

Some background - we have 3 masters that replication to each other and to 2 replication hubs. Those hubs replicate to each other and to 12 slaves in 2 data centers. We have host based firewalls, network based firewalls, and routers, and our replications do pass through firewalls and routers, even between masters. We have reduced,but still see a lot of these errors in the 389 errors log: [11/Dec/2009:00:06:19 -0500] - PR_Write(133) Netscape Portable Runtime error -5961 (TCP connection reset by peer.)

and we have worked to reduce these errors, but have not been able to eliminate them entirely. So, based on that, I am concerned that there are replication sessions that might be getting interrupted, and that might be contributing to the problem.

The situation doesnt happen daily, or neccessarily correlate to all of the TCP errors. But the situation could happen again during the next week or so. What other information and logging should we be gathering to figure out what is happening here? Should we create a bug on this?

This is the log from this morning at 4am on zuber, a slave, where the 'accounts' db was affected:

zuber.iam.gatech.edu: cat errors
       389-Directory/1.2.2 B2009.237.2054
       zuber.iam.gatech.edu:636 (/etc/dirsrv/slapd-zuber)

[11/Dec/2009:00:06:19 -0500] - PR_Write(133) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:00:06:19 -0500] - PR_Write(75) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:00:06:20 -0500] - PR_Write(161) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:00:10:43 -0500] - PR_Write(217) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:00:11:48 -0500] - PR_Write(246) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:00:14:20 -0500] - PR_Write(256) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:00:14:24 -0500] - PR_Write(218) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:00:14:35 -0500] - PR_Write(243) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:00:14:45 -0500] - PR_Write(264) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:00:14:54 -0500] - PR_Write(274) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:00:14:59 -0500] - PR_Write(284) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:00:15:00 -0500] - PR_Write(287) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:00:15:00 -0500] - PR_Write(220) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:04:34:00 -0500] - PR_Write(83) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:04:34:04 -0500] - PR_Write(224) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:04:34:07 -0500] - PR_Write(94) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:04:34:07 -0500] - PR_Write(144) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:04:34:08 -0500] - PR_Write(222) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:04:34:08 -0500] - PR_Write(227) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:04:34:08 -0500] - PR_Write(233) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:04:34:59 -0500] - ldbm: 'accounts' is already in the middle of another task and cannot be disturbed.
[11/Dec/2009:04:34:59 -0500] - slapi_start_bulk_import: failed; error = -23
[11/Dec/2009:04:35:00 -0500] NSMMReplicationPlugin - conn=2847000 op=178 replica="ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu": Unable to acquire replica: error: internal error [11/Dec/2009:04:35:00 -0500] NSMMReplicationPlugin - multimaster_be_state_change: replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu is going offline; disabling replication [11/Dec/2009:04:35:02 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:35:05 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:36:02 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:36:13 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:36:25 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:36:39 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:36:46 -0500] - PR_Write(220) Netscape Portable Runtime error -5961 (TCP connection reset by peer.) [11/Dec/2009:04:37:09 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:37:12 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:37:15 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:37:35 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:38:03 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:38:09 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:38:31 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:38:37 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:39:33 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:39:37 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:39:44 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:39:46 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:39:49 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:39:52 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:39:56 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:39:59 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:40:07 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:40:12 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:40:15 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:40:20 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:40:25 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:40:31 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:40:37 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:40:46 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:40:51 -0500] - WARNING: Import is running with nsslapd-db-private-import-mem on; No other process is allowed to access the database
[11/Dec/2009:04:40:51 -0500] - ERROR bulk import abandoned
[11/Dec/2009:04:40:51 -0500] - import accounts: Aborting all import threads...
[11/Dec/2009:04:41:00 -0500] - import accounts: Import threads aborted.
[11/Dec/2009:04:41:00 -0500] - import accounts: Closing files...
[11/Dec/2009:04:41:00 -0500] - import accounts: Import failed.
[11/Dec/2009:04:41:00 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1 [11/Dec/2009:04:41:29 -0500] NSMMReplicationPlugin - replica_replace_ruv_tombstone: failed to update replication update vector for replica ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu: LDAP error - 1

 (this error then repeats for hours)

--
389 users mailing list
389-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-directory-users

[Index of Archives]     [Fedora Directory Users]     [Fedora Directory Devel]     [Fedora Announce]     [Fedora Legacy Announce]     [Kernel]     [Fedora Legacy]     [Share Photos]     [Fedora Desktop]     [PAM]     [Red Hat Watch]     [Red Hat Development]     [Big List of Linux Books]     [Gimp]     [Yosemite News]

  Powered by Linux