[389-users] Help with 389 Directory Server Replication

Luiz Gustavo Quirino via 389-users <389-users@xxxxxxxxxxxxxxxxxxxxxxx> · Mon, 9 Dec 2024 13:19:34 -0300

I’m facing issues with replication in the following scenario:

    3 Linux nodes (Rocky) running version 2.4.5 B2024.198.0000 of 389.
    Replication is configured in a ring topology:
    node01 -> node02 -> node03 -> node01.
    Password changes are made via the PWM-Project web interface.

Problem:
At some point, the synchronization between nodes is lost. 
When I attempt to restart replication, the node being updated crashes the database. 
For example, when initializing replication from node01 to node02, the following error occurs:
---------
[09/Dec/2024:11:32:30.382466035 -0300] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry uid=app.tzv.w,OU=APLICACOES,dc=colorado,dc=local
[09/Dec/2024:11:32:30.387198997 -0300] - DEBUG - bdb_ldbm_back_wire_import - bdb_bulk_import_queue returned 0 with entry uid=app.poc.w,OU=APLICACOES,dc=colorado,dc=local
[09/Dec/2024:11:32:30.390378254 -0300] - ERR - factory_destructor - ERROR bulk import abandoned
[09/Dec/2024:11:32:30.557600717 -0300] - ERR - bdb_import_run_pass - import userroot: Thread monitoring returned: -23

[09/Dec/2024:11:32:30.559453847 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Aborting all Import threads...
[09/Dec/2024:11:32:36.468531612 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Import threads aborted.
[09/Dec/2024:11:32:36.470641812 -0300] - INFO - bdb_public_bdb_import_main - import userroot: Closing files...
[09/Dec/2024:11:32:36.553007637 -0300] - ERR - bdb_public_bdb_import_main - import userroot: Import failed.
[09/Dec/2024:11:32:36.574692177 -0300] - DEBUG - NSMMReplicationPlugin - consumer_connection_extension_destructor - Aborting total update in progress for replicated area dc=colorado,dc=local connid=7019159
[09/Dec/2024:11:32:36.577255941 -0300] - ERR - process_bulk_import_op - NULL target sdn
[09/Dec/2024:11:32:36.579573401 -0300] - DEBUG - NSMMReplicationPlugin - replica_relinquish_exclusive_access - conn=7019159 op=-1 repl="dc=colorado,dc=local": Released replica held by locking_purl=conn=7019159 id=3
[09/Dec/2024:11:32:36.600514849 -0300] - ERR - pw_get_admin_users - Search failed for cn=GRP_SRV_PREHASHED_PASSWORD,ou=389,OU=GRUPOS,ou=colorado,dc=colorado,dc=local: error 10 - Password Policy Administrators can not be set
[09/Dec/2024:11:32:36.757883417 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoding payload...
[09/Dec/2024:11:32:36.760105387 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded protocol_oid: 2.16.840.1.113730.3.6.1
[09/Dec/2024:11:32:36.762467539 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded repl_root: dc=colorado,dc=local
[09/Dec/2024:11:32:36.765113155 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - decoded csn: 6756ff84000001910000
[09/Dec/2024:11:32:36.767727935 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: RUV:
[09/Dec/2024:11:32:36.769205061 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replicageneration} 6748f91f000001910000
[09/Dec/2024:11:32:36.770721824 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 401 ldap://node01.ldap.colorado.br:389} 6748f921000001910000 6756ff6f000101910000 00000000
[09/Dec/2024:11:32:36.772753378 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 403 ldap://node03-ldap:389} 6748f9db000101930000 6756ff79000001930000 00000000
[09/Dec/2024:11:32:36.774289526 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop: {replica 402 ldap://node02.ldap.colorado.br:389} 6748f996000101920000 6756ff34000001920000 00000000
[09/Dec/2024:11:32:36.775750926 -0300] - DEBUG - NSMMReplicationPlugin - decode_startrepl_extop - Finshed decoding payload.
[09/Dec/2024:11:32:36.777404849 -0300] - DEBUG - NSMMReplicationPlugin - consumer_connection_extension_acquire_exclusive_access - conn=7019230 op=4 Acquired consumer connection extension
[09/Dec/2024:11:32:36.779856975 -0300] - DEBUG - NSMMReplicationPlugin - multisupplier_extop_StartNSDS50ReplicationRequest - conn=7019230 op=4 repl="dc=colorado,dc=local": Begin incremental protocol
[09/Dec/2024:11:32:36.781999075 -0300] - DEBUG - _csngen_adjust_local_time - gen state before 6756ff7b0001:1733754747:0:0
[09/Dec/2024:11:32:36.784626039 -0300] - DEBUG - _csngen_adjust_local_time - gen state after 6756ff840000:1733754756:0:0
[09/Dec/2024:11:32:36.786708353 -0300] - DEBUG - csngen_adjust_time - gen state before 6756ff840000:1733754756:0:0
[09/Dec/2024:11:32:36.788232997 -0300] - DEBUG - csngen_adjust_time - gen state after 6756ff840001:1733754756:0:0
[09/Dec/2024:11:32:36.790217310 -0300] - DEBUG - NSMMReplicationPlugin - replica_get_exclusive_access - conn=7019230 op=4 repl="dc=colorado,dc=local": Acquired replica
----------
To restore synchronization, I need to delete all replication configurations and recreate them. However, the issue reappears after some time.

I’d appreciate any suggestions on how to identify and resolve this issue permanently.

Thks.

-- 
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue