Hi,
You are describing two different symptoms:
- replication gets desynchronized.
- replication initialization fails
We do not have much data about the first one. It could be due to lots of reasons.
About the second one:
The bdb_bulk_import_queue returned 0 with entry ... are normal
The first error is:
[09/Dec/2024:11:32:30.390378254 -0300] - ERR - factory_destructor - ERROR bulk import abandoned
Which happens on the server being initialized when the connection handling the replication initialization gets unexpectedly closed.
That could usually be because:
[1] there was a problem on the supplier from which the replication get initialized.
(Is there errors on that supplier error log or has it crashed ?)
You are describing two different symptoms:
- replication gets desynchronized.
- replication initialization fails
We do not have much data about the first one. It could be due to lots of reasons.
About the second one:
The bdb_bulk_import_queue returned 0 with entry ... are normal
The first error is:
[09/Dec/2024:11:32:30.390378254 -0300] - ERR - factory_destructor - ERROR bulk import abandoned
Which happens on the server being initialized when the connection handling the replication initialization gets unexpectedly closed.
That could usually be because:
[1] there was a problem on the supplier from which the replication get initialized.
(Is there errors on that supplier error log or has it crashed ?)
[2] network is unreliable (that could also explain why replication get desynchronized)
regards
Pierre
regards
Pierre
On Mon, Dec 9, 2024 at 5:20 PM Luiz Gustavo Quirino via 389-users <389-users@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
I’m facing issues with replication in the following scenario:--
3 Linux nodes (Rocky) running version 2.4.5 B2024.198.0000 of 389.
Replication is configured in a ring topology:
node01 -> node02 -> node03 -> node01.
Password changes are made via the PWM-Project web interface.
Problem:
At some point, the synchronization between nodes is lost.
When I attempt to restart replication, the node being updated crashes the database.
For example, when initializing replication from node01 to node02, the following error occurs:
---------
[09/Dec/2024:11:32:30.382466035 -0300] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry uid=app.tzv.w,OU=APLICACOES,dc=colorado,dc=local
[09/Dec/2024:11:32:30.387198997 -0300] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry uid=app.poc.w,OU=APLICACOES,dc=colorado,dc=local
[09/Dec/2024:11:32:30.390378254 -0300] - ERR - factory_destructor - ERROR bulk import abandoned
[09/Dec/2024:11:32:30.557600717 -0300] - ERR - bdb_import_run_pass - import userroot: Thread monitoring returned: -23
[09/Dec/2024:11:32:30.559453847 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Aborting all Import threads...
[09/Dec/2024:11:32:36.468531612 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Import threads aborted.
[09/Dec/2024:11:32:36.470641812 -0300] - INFO - bdb_public_bdb_import_main - import userroot: Closing files...
[09/Dec/2024:11:32:36.553007637 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Import failed.
[09/Dec/2024:11:32:36.574692177 -0300] - DEBUG - NSMMReplicationPlugin - consumer_connection_extension_destructor - Aborting total update in progress for replicated area dc=colorado,dc=local connid=7019159
[09/Dec/2024:11:32:36.577255941 -0300] - ERR - process_bulk_import_op - NULL target sdn
[09/Dec/2024:11:32:36.579573401 -0300] - DEBUG - NSMMReplicationPlugin - replica_relinquish_exclusive_access - conn=7019159 op=-1 repl="dc=colorado,dc=local": Released replica held by locking_purl=conn=7019159 id=3
[09/Dec/2024:11:32:36.600514849 -0300] - ERR - pw_get_admin_users - Search failed for cn=GRP_SRV_PREHASHED_PASSWORD,ou=389,OU=GRUPOS,ou=colorado,dc=colorado,dc=local: error 10 - Password Policy Administrators can not be set
[09/Dec/2024:11:32:36.757883417 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoding payload...
[09/Dec/2024:11:32:36.760105387 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded protocol_oid: 2.16.840.1.113730.3.6.1
[09/Dec/2024:11:32:36.762467539 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded repl_root: dc=colorado,dc=local
[09/Dec/2024:11:32:36.765113155 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded csn: 6756ff84000001910000
[09/Dec/2024:11:32:36.767727935 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: RUV:
[09/Dec/2024:11:32:36.769205061 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replicageneration} 6748f91f000001910000
[09/Dec/2024:11:32:36.770721824 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 401 ldap://node01.ldap.colorado.br:389} 6748f921000001910000 6756ff6f000101910000 00000000
[09/Dec/2024:11:32:36.772753378 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 403 ldap://node03-ldap:389} 6748f9db000101930000 6756ff79000001930000 00000000
[09/Dec/2024:11:32:36.774289526 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 402 ldap://node02.ldap.colorado.br:389} 6748f996000101920000 6756ff34000001920000 00000000
[09/Dec/2024:11:32:36.775750926 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - Finshed decoding payload.
[09/Dec/2024:11:32:36.777404849 -0300] - DEBUG - NSMMReplicationPlugin - consumer_connection_extension_acquire_exclusive_access - conn=7019230 op=4 Acquired consumer connection extension
[09/Dec/2024:11:32:36.779856975 -0300] - DEBUG - NSMMReplicationPlugin - multisupplier_extop_StartNSDS50ReplicationRequest - conn=7019230 op=4 repl="dc=colorado,dc=local": Begin incremental protocol
[09/Dec/2024:11:32:36.781999075 -0300] - DEBUG - _csngen_adjust_local_time - gen state before 6756ff7b0001:1733754747:0:0
[09/Dec/2024:11:32:36.784626039 -0300] - DEBUG - _csngen_adjust_local_time - gen state after 6756ff840000:1733754756:0:0
[09/Dec/2024:11:32:36.786708353 -0300] - DEBUG - csngen_adjust_time - gen state before 6756ff840000:1733754756:0:0
[09/Dec/2024:11:32:36.788232997 -0300] - DEBUG - csngen_adjust_time - gen state after 6756ff840001:1733754756:0:0
[09/Dec/2024:11:32:36.790217310 -0300] - DEBUG - NSMMReplicationPlugin - replica_get_exclusive_access - conn=7019230 op=4 repl="dc=colorado,dc=local": Acquired replica
----------
To restore synchronization, I need to delete all replication configurations and recreate them. However, the issue reappears after some time.I’d appreciate any suggestions on how to identify and resolve this issue permanently.Thks.
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
--
389 Directory Server Development Team
389 Directory Server Development Team
-- _______________________________________________ 389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue