> On 4 Jan 2021, at 08:06, Glenn Morris <rgm@xxxxxxxxxxxx> wrote: > > > Hi, > > I'm using version 1.4.3 on CentOS 8.3. > I'm trying to set up replication with a single master and a single consumer, > following the steps from > > https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/single-master_replication#setting_up_single-master_replication_using_the_command_line > > It seems to work, in that the database is populated on the consumer, and > when I change a database entry on the master, the change appears on the > consumer. > > However, replication status commands seem (?) to indicate that something > isn't working completely right. Eg when I do: > > dsconf -w "$passwd" -D "$rootdn" $instance repl-agmt status \ > --suffix $suffix $agreement > > I get: > > Replica Enabled: on > Update In Progress: FALSE > Last Update Start: 20210103213704Z > Last Update End: 20210103213704Z > Number Of Changes Sent: 1:1/0 > Number Of Changes Skipped: None > Last Update Status: Error (0) Replica acquired successfully: Incremental > update succeeded > Last Init Start: 19700101000000Z > Last Init End: 19700101000000Z > Last Init Status: unavailable > Reap Active: 0 > Replication Status: Not in Synchronization: supplier > (5ff237d3000000010000) consumer (Unavailable) State (green) Reason > (error (0) replica acquired successfully: incremental update succeeded) > Replication Lag Time: Unavailable > > The last two entries seem to indicate some problem? > > In the logs on the consumer, I see the following entries that I think > might be (?) related to replication: > conn=29 fd=64 slot=64 SSL connection from MASTER.IP to MY.IP > conn=29 op=-1 fd=64 closed - unknown error > > If I increase the logging level, I get: > DEBUG - connection_read_operation - connection 77 waited 1 times for > read to be ready > DEBUG - connection_read_operation - PR_Recv for connection 77 > returns -12109 (unknown error) > DEBUG - disconnect_server_nomutex_ext - Setting conn 77 fd=64 to > be disconnected: reason -12109 > > Also, when I restart my consumer for the very first time after setting > up the replication agreement, ns-slapd reliably hangs using 100% CPU. > Strace shows endless: > > select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=0}) = 0 (Timeout) > poll([{fd=22, events=POLLIN}], 1, 0) = 0 (Timeout) > > where fd/22 = a pipe. > If I kill -9 it, it starts working. > I'm not sure if this has any relation. > > Thanks for this. Indeed, if I replace "--port=636 --conn-protocol=LDAPS" > (from "Steps to be Performed on the Supplier" in the Red Hat docs) > with "--port=389 --conn-protocol=StartTLS" when running "repl-agmt create", > then the status command reports "Replication Status: In Synchronization" > (after the first change is synced). It leaves me wondering a bit how > secure it is though... StartTLS over 389 is "effectively" equivalent in strength to LDAPS at least for replication security wise. LDAPS is preferred though. Saying this, if StartTLS is working but LDAPS is not that points to something else fishy - StartTLS and LDAPS both use the same CA verification routines and connection/tls machinery, so perhaps there is a problem in network connectivity or some redirection from LDAPS. Some basic checks to ensure that ldapwhoami/ldapsearch work over ldaps:// to all the servers in your topology would be a good start, including then say doing the same ldapwhoami/ldapsearch from on the nodes in the topology to each other to ensure nothing in between is causing issues. — Sincerely, William Brown Senior Software Engineer, 389 Directory Server SUSE Labs, Australia _______________________________________________ 389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx