Re: replication in containerized 389ds

William Brown <wbrown@xxxxxxx> · Thu, 16 May 2019 12:26:29 +1000

> On 16 May 2019, at 12:16, aravind gosukonda <arabha123@xxxxxxxxx> wrote:
> 
> Hello William,
> 
> Thank you for the advice.
>> Hey there! 
>> 
>> Great to hear you want to use this in a container. I have a few things to advise here.
>> 
>> From reading this it looks like you want to have:
>> 
>> [ Container 1 ]    [ Container 2 ]    [ Container 3 ]
>>          |                       |                           |
>> [                       Shared Volume                          ]
>> 
>> So first off, this is *not* possible or supported. Every DS instance needs it's own
>> volume, and they replicate to each other:
>> 
>> [ Container 1 ]    [ Container 2 ]    [ Container 3 ]
>>         |                            |                         | 
>> [   Volume 1   ]   [     Volume 2 ]    [    Volume 3   ]
>> 
>> You probably also can't autoscale (easily) as a result of this. I'm still working
>> on ideas to address this ... 
>> 
>> But you can manually scale, if you script things properly.
> I have a separate persistent volume mounted to each container, as you suggest. I use a statefulset, so the same volume is mounted across container replacements.
> 
>> Every instance needs it's own changelog, and that is related to it's replica ID.
>> If you remove a replica there IS a clean up process. Remember, 389 is not designed as a
>> purely stateless app, so you'll need to do some work to manage this. 
> I've setup each instance to have it's own changelog, present in the persistent volume. The scenario I had in mind was, if a container is deleted and recreated, for any reason. My assumption is it'll take a few minutes, or probably hours, in the worst case scenario. For all practical purposes, this will be like a reboot of a host running a ds instance. Should I have any checks to see if it's working, or leave it alone and let replication deal with the delay?

A simple way to consider this is that every 389 instance in a container is a read-only replica, then you simplfy your system a lot (RO instances have a replica ID of 65535 (I think)). This way on startup/shutdown you just re-init the RO from an external hub or similar, then you don't care if you delete the volume associate with the container.

If you plan to make your container instances writeable, you should probably not autoscale - consider a container addition/removal the same as adding/removing a host, requiring a clean ruv, and other maintenance tasks to be performed. Consider each persistent volume with a replica id, db, changelog, as the "instance" and the container just enables access to it. 

So every time you add another container to the scaling, you need to add another persistent volume, with it's own unique replica Id's, db, changelog, and then have replication between them.

Perhaps what could help me is a diagram of your planned infrastructure? 

> 
>> 
>> You'll need to just assert they exist statefully - ansible can help here.
> Since I'm using persistent volumes, the replication agreements will be in place, if it's a configured instance. It struck me while writing this reply, that a container replacement, in my case, will be similar to a host reboot, as all the config/data is available in a persistent volume. In this case, do I need to treat container replacement differently?

To help with this, let's assume:

[ Container 1 ]
         |
[ Volume ID abcd ]

Now you destroy container 1 and upgrade to a newer version - if this is the case, so long as all your stateful data is in the volume (dse.ldif, db, changelog db), then this is fine:

[ Container NEW! ]
         |
[ Volume ID abcd ]

It would act like container 1 did, with the same replica ID etc.

> 
>> What do you mean by "re-init" here? from another replica? The answer is ...
>> "it depends".
> 
> 
>> So many things can go wrong. Every instance needs it's own volume, and data is shared
>> via replication. 
>> 
>> Right now, my effort for containerisation has been to help support running 389 in atomic
>> host or suse transactional server. Running in kubernetes "out of the box" is a
>> stretch goal at the moment, but if you are willing to tackle it, I'd fully help and
>> support you to upstream some of that work. 
>> 
>> 
>> Most likely, you'll need to roll your own image, and youll need to do some work in
>> dscontainer (our python init tool) to support adding/removing of replicas, configuration
>> of the replicaid, and the replication passwords. 
> Since I started this project a while ago, I have been using a base image and installing 389 on top of it, with some modifications, taken from https://github.com/dabelenda/container-389ds/blob/master/Dockerfile, which disable hostname checks, remove the startup via systemd, etc. I'm using kubernetes secrets for storing passwords for directory manager, replication manager, etc. For replica id configuration, as I'm using a statefulset which spins up containers with names like 389-ds-0, 389-ds-1, 389-ds-2, I'm reading the hostname of the container and generating the replica ID.  I haven't yet tried  the dscontainer tool, which I see that does some of the things that the linked dockerfile does, and a lot more too. 

It would be great to have some more testing of the dscontainer tool too, so please see how that goes. You can use the latest with opensuse/tumbleweed:latest as a docker base image, and just zypper in 389-ds-base. If you want even NEWER versions, you can look at network:ldap as a repo - I'm happy to help provide dockerfile advice for these cases. These assume all your state is in /data, so provided you have that you can work as per the example above. 

> 
>> 
>> At a guess your POD architecture should be 1 HUB which receives all incomming replication
>> traffic, and then the HUB dynamically adds/removes agreements to the the consumers, and
>> manages them. The consumers are then behind the haproxy instance that is part of kube. 
>> 
>> Your writeable servers should probably still be outside of this system for the moment :) 
>> 
>> 
>> Does that help? I'm really happy to answer any questions, help with planning and
>> improve our container support upstream with you. 
>> 
>> Thanks, 
>> 
>> —
>> Sincerely,
>> 
>> William Brown
>> 
>> Senior Software Engineer, 389 Directory Server
>> SUSE Labs
> 
> Thanks,
> Aravind
> _______________________________________________
> 389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
> Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx

—
Sincerely,

William Brown

Senior Software Engineer, 389 Directory Server
SUSE Labs
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx