Re: [389-users] Replication and High Availalbiltiy

Rich Megginson <rmeggins@xxxxxxxxxx> · Tue, 17 Nov 2009 20:23:00 -0700

Bucl, Casper wrote:
-----Original Message-----
From: fedora-directory-users-bounces@xxxxxxxxxx [mailto:fedora-directory-users-bounces@xxxxxxxxxx] On Behalf Of Rich Megginson
Sent: Tuesday, November 17, 2009 12:35 PM
To: General discussion list for the 389 Directory server project.
Subject: Re: [389-users] Replication and High Availalbiltiy

Bucl, Casper wrote:

-----Original Message-----
From: fedora-directory-users-bounces@xxxxxxxxxx 
[mailto:fedora-directory-users-bounces@xxxxxxxxxx] On Behalf Of Rich 
Megginson
Sent: Tuesday, November 17, 2009 8:23 AM
To: General discussion list for the 389 Directory server project.
Subject: Re: [389-users] Replication and High Availalbiltiy

Bucl, Casper wrote:

Hi,

I'm trying to create a high availability ldap for a system I have in 
place that is currently using multimaster replication. Using a shared 
storage system isn't an option in this case.

To give you an idea of what our setup looks like,

There are two nodes, that have replication set up. These are set up 
as multimasters and have processes that write to both of them. These 
changes replicate to the other ldap server.

Now I need them to be in a high availability configuration.

I have created duplicates of each node and gotten the high 
availability portion on each of them to work correctly.

The problem comes with fedora and replication.

I have tried multiple ways of setting up fedora and replication and 
they always seem to end up with changes not being replicated to the 
other master when we have failed over to the secondary node. The two 
most successful one's are below

Configurations.

Full Mesh: All links were set up as a two way replication.

This always ends up with at least 2 nodes showing errors saying it 
"Can't locate CSN" or "Duplicate node ID"

Node1A ------- Node1B

| \ / |

| X |

| / \ |

Node2A ------- Node2B

Single replication agreement between VIPs

In this configuration, we initially copied over the slapd instance 
directory on setup of the second HA node (Node1A to Node1B) so that 
the settings and configurations are identical on both. Then as 
changes were made to the ldap, we created backups using db2bak. These 
backups are copied over to the failover box and then imported on 
startup of fedora ds. This doesn't appear to backup the changelog and 
ends up with an error saying "Can't locate CSN" again.

Node1 VIP

|

|

Node2 VIP

I have tried other things as well and they were a lot less fruitful 
than the two examples I have here.

Has anyone set up a high availability scenario similar to this? Can 
anyone suggest a different process or configuration that would 
accomplish what I'm after?

Yes. Configurations like this have been working at high volume installations for several years.

Let's start with - what platform are you running on your systems? What version of DS? What procedure did you use to set up and initialize your replicas?

Thanks,

Casper

---------------------------------------------------------------------
-
--

--
389 users mailing list
389-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-directory-users

The environment is set up using Fedora-Directory 1.0.4

What platform?  I would suggest using the latest (1.2.2 or 1.2.4 which is in testing).  We have fixed many, many bugs in replication since 1.0.4, and the issue with the CSN you are reporting sounds like a bug that has been fixed in 1.2.x.

To set up the multimaster replication I used the mmr.pl script. When reinitializing the consumers, I use ldapmodify and set the nsDS5BeginReplicaRefresh to start.

Another question about the fully meshed configuration, Can there be more nodes? We will be wanting to add another HA node to the environment so this would take the total to 6 directory servers. The idea being that there is a central hub that is kind of the global master. Everyone replicates their info up to it and then it gets redistributed back out to the others.

Yes, you can have more than 4 masters.

Would the easier method be to copy the changelog information over to the standby node?

No.

Is there a method to do this?

Not really.

Thanks,
Casper

--
389 users mailing list
389-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-directory-users

Hi Rich,
What is the proper way to reinitialize the replication agreement in a multimaster configuration. Whenever I try using the nsDS5BeginReplicaRefresh method, it ends up creating a new changelog on the node being refreshed and then it begins having replication issues. Notably the "Can't locate CSN" error. Could this be something related to some of the bugs you were speaking of?

Yes.  See https://bugzilla.redhat.com/show_bug.cgi?id=388021 - fixed in 
1.1.0

What platform are you running on?
Casper

--
389 users mailing list
389-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-directory-users

<<attachment: smime.p7s>>
--
389 users mailing list
389-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-directory-users