Re: Idea to make replication a bit cleaner

William Brown <wibrown@xxxxxxxxxx> · Tue, 14 Jun 2016 09:54:40 +1000

On Mon, 2016-06-13 at 16:53 -0400, Mark Reynolds wrote:
> 
> On 06/13/2016 05:33 AM, Ludwig Krispenz wrote:
> > 
> > Hi German,
> > 
> > you are right that IPA is on the safe side, they maintain the last 
> > used replicaID and when creating a server instance only a higher 
> > replicaid is used, also when a server is removed, the removal triggers 
> > a cleanallruv, either from the script or by the topology plugin (>4.3).
> > This is because in IPA all server instance creation and removal is 
> > managed by IPA cammands.

From what I have been told, IPA can be affected by this if the same replica file is used to re-provision a replica ... So lets
assume even though IPA has some "nicer" behaviours, that they still cannot be saved from the admin who makes a mistake!

> > 
> > This framework is not there by default for "admion managed" DS 
> > deployments, and that's what William wants to get improved.

Also very true. But if we fix it for DS, we fix it for IPA, and IPA's management becomes simpler.

> > 
> > William,
> > I'm not sure that the scenario you describe is really so bad as you 
> > think. If  a server is removed and the RID is not cleaned, it's 
> > component remains in the RUV, but this is just an overhead in 
> > examining ruvs, but should not block replication to continue. If a new 
> > server with the old, removed replicaID is installed, the ruv component 
> > should be reused, the URL replaced, and replication continue as if 
> > there haven't been updates for this replica ID for a long time. I'm 
> > saying "should", since there mioght be some cases where the changelog 
> > was purged and an anchor csn for the old/new replicaid cannot be 
> > found. So we need to do some tests and it would be good to make this safe.
> > One option would be to maintain already used replicaids, so at the 
> > init of a new server there could not only be a check for the same 
> > database generation, but also for a valid RUV

Yes, that could work. So when you make the server the replica ID if set is ignored, and when it first joins the replication it's
"set" by the remote peers? That would work, and would avoid the need of cleanall ruv.

Part of me wonders if we could make replicaID a private attribute, that an admin could never fudge with ....

> > I think it is difficult to find a trigger for an automated 
> > cleanallruv, we would have to maintain something like a topology view 
> > of the deployment, like the topology plugin in IPA does.

Mmmmmm. I'm getting conflicting stories. On this hand, it sounds okay, because you can still continue replication despite the
re-use, but I'm being told of customer cases where these incidents cause long outages and cleanups. I will get more data on this
to contribute to the discussion, in case there is something I am misunderstanding about the root cause. 

> William,
> 
> "So, an have some idea for this. Any change to a replication agreement, 
> should trigger a CLEANALLRUV, before we start the
> agreement. "
> 
> I am assuming you don't mean replication agreement, but instead the 
> "replica configuration".

Yes.

> 
> One scenario where we could trigger a cleanallruv task is when we demote 
> a master to a hub/consumer, as well as when we delete a master replica.  
> This shouldn't be too hard to do, and it makes sense.  I'm not sure if 
> its something that should always be done automatically though(yet 
> another config option?)

I prefer it to be a correct, automatic behaviour, than configurable. As an ex ldap admin, I can tell you that I didn't have the
time to investigate all the 1000 knobs of DS, I needed it to "just work, and do the right thing". So lets make it correct out of
box.

> 
> Mark
> > 
> > 
> > But it is definitely worth to think about solutions for thi sproblem
> > 
> > Ludwig
> > 
> > 
> > On 06/13/2016 10:21 AM, German Parente wrote:
> > > 
> > > Hi William,
> > > 
> > > I think this case is covered in IPA. I have never seen a new replica 
> > > added with the same former ID of an old one.
> > > 
> > > The former ruvs are not cleaned automatically, though, in current 
> > > versions and it's not a very severe issue now. There are also ipa 
> > > commands to list and clean the ruvs.
> > > 
> > > I have also heard or read that in the dev versions (not still 
> > > delivered), the cleaning is automatic, as you are proposing.
> > > 
> > > Thanks a lot.
> > > 
> > > regards,
> > > 
> > > German.
> > > 
> > > 
> > > On Mon, Jun 13, 2016 at 7:21 AM, William Brown <wibrown@xxxxxxxxxx 
> > > <mailto:wibrown@xxxxxxxxxx>> wrote:
> > > 
> > >     Hi,
> > > 
> > >     I was discussing with some staff here in BNE about replication.
> > > 
> > >     It seems a common case is that admins with 2 or 3 servers in MMR
> > >     (both DS and IPA) will do this:
> > > 
> > >     * Setup all three masters A, B, C (replica id 1,2,3 respectively)
> > >     * Run them for a while in replication
> > >     * Remove C from replication
> > >     * Delete data, change the system
> > >     * Re-add C with the same replica id.
> > > 
> > >     Supposedly this can cause duplicate RUV entries for id 3 in
> > >     masters A and B. Of course, this means that replication has all
> > >     kinds of insane issues at this point ....
> > > 
> > > 
> > >     On one hand, this is the admins fault. But on the other, we
> > >     should handle this. Consider an admin who re-uses an IPA replica
> > >     setup file, without running CLEANALLRUV
> > > 
> > >     So, an have some idea for this. Any change to a replication
> > >     agreement, should trigger a CLEANALLRUV, before we start the
> > >     agreement. This means on our local master we have removed the bad
> > >     RUV first, then we can add the RUV of the newly added master
> > >     when needed ....
> > > 
> > >     What do you think? I think that we must handle this better, and
> > >     it should be a non-issue to admins.
> > > 
> > > 
> > >     We can't prevent an admin from intentionally adding duplicate
> > >     ID's to the topology though. So making it so that the ID's are not
> > >     admin controlled would prevent this, but I haven't any good ideas
> > >     about this  (yet)
> > > 
> > > 
> > > 
> > > 
> > >     --
> > >     Sincerely,
> > > 
> > >     William Brown
> > >     Software Engineer
> > >     Red Hat, Brisbane
> > > 
> > > 
> > >     --
> > >     389-devel mailing list
> > >     389-devel@xxxxxxxxxxxxxxxxxxxxxxx
> > >     <mailto:389-devel@xxxxxxxxxxxxxxxxxxxxxxx>
> > >     https://lists.fedoraproject.org/admin/lists/389-devel@xxxxxxxxxxxxxxxxxxxxxxx
> > > 
> > > 
> > > 
> > > 
> > > --
> > > 389-devel mailing list
> > > 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
> > > https://lists.fedoraproject.org/admin/lists/389-devel@xxxxxxxxxxxxxxxxxxxxxxx
> > --
> > 389-devel mailing list
> > 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
> > https://lists.fedoraproject.org/admin/lists/389-devel@xxxxxxxxxxxxxxxxxxxxxxx

-- 
Sincerely,

William Brown
Software Engineer
Red Hat, Brisbane

Attachment:
signature.asc

Description: This is a digitally signed message part
--
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://lists.fedoraproject.org/admin/lists/389-devel@xxxxxxxxxxxxxxxxxxxxxxx