389 doesn't seem to handle replica removal very well. The removed
replica remains in the RUV element on all other servers. The
http://port389.org/wiki/Howto:CLEANRUV task exists for this purpose, but
you have to run it on all masters simultaneously, or the removed replica
will show up again, replicated from another master that still has the
RUV element in it. Furthermore, it may not be possible to CLEANRUV if
the master is down for some reason.
In the short term, we need to identify exactly what happens when a
replica is deleted - what gets replicated, how the RUV is handled on the
supplier and consumer, what changelog db interactions there are, what
the effects on replication are, and how to recover from this situation.
In the long term, we need to make replication much more resilient and
robust despite replica removal.
One solution is to somehow "mark" the RUV element e.g. use a port number
of 0 or some other "magic" value, or use a special max CSN. The goal
here is to do something that won't break older replicas - users must be
able to have a mixed old and new replication topology - solutions that
begin with "upgrade all servers simultaneously" are non-starters.
Longer term, we should investigate if it is possible to "replicate" the
CLEANRUV operation. Or allow explicit operations on the RUV tombstone
entry that could be replicated. We could change the RUV or RUV element
format to add a version field, and add a field that explicitly marks an
RUV element as deleted. We already have code in the replication
supplier to get the "capabilities" of the consumer, we would just need
to extend this, to know if the consumer understands "versioned" RUVs.
--
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel