[389-devel] requirements for replica removal robustness

Rich Megginson <rmeggins@xxxxxxxxxx> · Thu, 05 Apr 2012 15:25:36 -0600

389 doesn't seem to handle replica removal very well.   The removed 
replica remains in the RUV element on all other servers.  The 
http://port389.org/wiki/Howto:CLEANRUV task exists for this purpose, but 
you have to run it on all masters simultaneously, or the removed replica 
will show up again, replicated from another master that still has the 
RUV element in it.  Furthermore, it may not be possible to CLEANRUV if 
the master is down for some reason.

In the short term, we need to identify exactly what happens when a 
replica is deleted - what gets replicated, how the RUV is handled on the 
supplier and consumer, what changelog db interactions there are, what 
the effects on replication are, and how to recover from this situation.

In the long term, we need to make replication much more resilient and 
robust despite replica removal.

One solution is to somehow "mark" the RUV element e.g. use a port number 
of 0 or some other "magic" value, or use a special max CSN.  The goal 
here is to do something that won't break older replicas - users must be 
able to have a mixed old and new replication topology - solutions that 
begin with "upgrade all servers simultaneously" are non-starters.

Longer term, we should investigate if it is possible to "replicate" the 
CLEANRUV operation.  Or allow explicit operations on the RUV tombstone 
entry that could be replicated.  We could change the RUV or RUV element 
format to add a version field, and add a field that explicitly marks an 
RUV element as deleted.  We already have code in the replication 
supplier to get the "capabilities" of the consumer, we would just need 
to extend this, to know if the consumer understands "versioned" RUVs.

--
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel