Re: [Openstack-operators] [Openstack] Nova Controller HA issues

Igor Laskovy <igor.laskovy@xxxxxxxxx> · Thu, 28 Jun 2012 13:57:30 +0300

Hi guys, have any updates here?

On Sun, Jun 17, 2012 at 12:59 PM, Igor Laskovy <igor.laskovy@xxxxxxxxx> wrote:
> John, Jason, can you please concretely clarify what this bad things? For
> example the worst-case.
>
> Yoshi, Kei, can you please clarify current status of Kemari. How far it is
> from production usage?
>
> On Fri, Jun 15, 2012 at 5:48 PM, Jason Hedden <jhedden@xxxxxxxxxxx> wrote:
>> I'm running 2 full nova controllers behind a NGINX load balancer.  While
>> there still is that chance of half completed tasks, it's been working very
>> well.
>>
>> Each nova controller is running (full time) nova-scheduler, nova-cert,
>> keystone, and 6 nova-api processes. All API requests go through NGINX which
>> reverse proxies the traffic to these 2 systems.
>>
>> example Nginx nova-api config:
>> upstream nova-api  {
>>  server hostA:8774 fail_timeout=30s;
>>  server hostB:8774 fail_timeout=30s;
>>  server hostA:18774 fail_timeout=30s;
>>  server hostB:18774 fail_timeout=30s;
>>  server hostA:28774 fail_timeout=30s;
>>  server hostB:28774 fail_timeout=30s;
>>  server hostA:38774 fail_timeout=30s;
>>  server hostB:38774 fail_timeout=30s;
>>  server hostA:48774 fail_timeout=30s;
>>  server hostB:48774 fail_timeout=30s;
>>  server hostA:58774 fail_timeout=30s;
>>  server hostB:58774 fail_timeout=30s;
>> }
>>
>> server {
>>  listen x.x.x.x:8774;
>>  server_name public.name;
>>
>>  location / {
>>    proxy_pass  http://nova-api;
>>    proxy_set_header        Host            "public.address:8774";
>>    proxy_set_header        X-Real-IP       $remote_addr;
>>    proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
>>  }
>> }
>>
>>
>> Attached is a diagram that gives a brief overview of the HA environment
>> I've setup.
>>
>> --Jason Hedden
>>
>>
>> On Jun 15, 2012, at 5:36 AM, John Garbutt wrote:
>>
>>> I know there is some work in the XenAPI driver to make it resilient to
>>> these kinds of failures (to allow frequent updates of the nova code), and I
>>> think there were plans for the work to be reused in the Libvirt driver.
>>>
>>> AFAIK, in Essex and lower, bad things can happen if you don’t wait for
>>> all the tasks to finish. You may well be OK some of the time.
>>>
>>> It boils down to an issue of consuming the message from Rabbit but not
>>> completing the task, and not being able to recover from half completed
>>> tasks.
>>>
>>> Hope that helps,
>>> John
>>>
>>> From: Igor Laskovy [mailto:igor.laskovy@xxxxxxxxx]
>>> Sent: 15 June 2012 11:31
>>> To: Christian Parpart
>>> Cc: John Garbutt; openstack-operators@xxxxxxxxxxxxxxxxxxx;
>>> &lt,openstack@xxxxxxxxxxxxxxxxxxx&gt,
>>> Subject: Re: [Openstack-operators] Nova Controller HA issues
>>>
>>> I am using OpenStack for my little lab for a short time too))
>>>
>>> Ok, you are right of course, but I meant a some another design when told
>>> about virtualization controller nodes.
>>>
>>> It is can be only two dedicated hypetvisor with dedicated share/drbd
>>> between them. This hypervisors will be standalone, and not be part of nova.
>>> Than, maybe pacemaker or another tool can take availability function to
>>> restart VM to alive node when active will die.
>>>
>>> Main question here - how worth can be if occurs controller nodes
>>> unexpected power off. In another word, when VM restart it will be in crash
>>> consisted state.
>>> Will some nova services will loose here?
>>> Will RabbiMQ loose some data here? (I am new to RabbitMQ too)
>>>
>>> Igor Laskovy
>>> facebook.com/igor.laskovy
>>> Kiev, Ukraine
>>>
>>> On Jun 15, 2012 10:54 AM, "Christian Parpart" <trapni@xxxxxxxxx> wrote:
>>> Hey,
>>>
>>> well, I said "I might be wrong" because I have no "clear" vision on how
>>> OpenStack works in
>>> its deepest detail, however, I would not like to depend on a controller
>>> node that
>>> is inside a virtual machine, controlled by compute nodes, that are
>>> controlled by the controller
>>> node. This sounds quite like a chicken-and-egg problem.
>>>
>>> However, at the time of this writing, I think you'll have to have a
>>> working nova-scheduler process,
>>> which is responsible on deciding on which compute node to spawn your VM
>>> (what else?),
>>> and think about what you do when this (or all your controller-)VMs
>>> terribly die,
>>> and you want to rebuild it, how do you plan to do this when your
>>> controller node is out-of-service?
>>>
>>> I in my case have put the controller services onto two compute nodes, and
>>> use Pacemaker
>>> to switch between them, in case one node goes down, the other can take
>>> over (via shared service-IP).
>>>
>>> Again, these are my thoughts, and I am using OpenStack for just about a
>>> month now :-)
>>> But I hope this helps a bit...
>>>
>>> Best regards,
>>> Christian Parpart.
>>>
>>> On Fri, Jun 15, 2012 at 8:16 AM, Igor Laskovy <igor.laskovy@xxxxxxxxx>
>>> wrote:
>>> Why? Can you please clarify.
>>>
>>> Igor Laskovy
>>> facebook.com/igor.laskovy
>>> Kiev, Ukraine
>>>
>>> On Jun 15, 2012 1:55 AM, "Christian Parpart" <trapni@xxxxxxxxx> wrote:
>>> I don't think putting the controller node completely into a VM is a good
>>> advice,
>>> at least when speaking of nova-scheduler and nova-api (if central).
>>>
>>> I may be wrong, and if so, please correct me.
>>>
>>> Christian.
>>>
>>> On Thu, Jun 14, 2012 at 7:20 PM, Igor Laskovy <igor.laskovy@xxxxxxxxx>
>>> wrote:
>>> Hi, have any updates there?
>>> Can anybody clarify what happens if controller nodes just going hard
>>> shutdown?
>>>
>>> I thinking about solution with two hypervisors and putting controller
>>> node in VM shared storage, which can be relaunched when active
>>> hypervisor will die.
>>> Any ideas, advise?
>>>
>>>
>>> On Tue, Jun 12, 2012 at 3:52 PM, John Garbutt <John.Garbutt@xxxxxxxxxx>
>>> wrote:
>>> > Sure, I get your point.
>>> >
>>> > I think Florian is working on some docs to help on that.
>>> >
>>> > Not sure how much has been done already.
>>> >
>>> >
>>> >
>>> > Cheers,
>>> >
>>> > John
>>> >
>>> >
>>> >
>>> > From: Christian Parpart [mailto:trapni@xxxxxxxxx]
>>> > Sent: 12 June 2012 13:47
>>> > To: John Garbutt
>>> > Cc: openstack-operators@xxxxxxxxxxxxxxxxxxx
>>> > Subject: Re: [Openstack-operators] Nova Controller HA issues
>>> >
>>> >
>>> >
>>> > Hey, ya I also found this page, but didn't find it yet that helpful, it
>>> > rather much sounds like a theoretical paper on
>>> >
>>> > how they implemented it rather then telling me on how to actually make
>>> > it
>>> > happen (from the sysop point of view :-)
>>> >
>>> >
>>> >
>>> > I hoped that someone had to face this already, since I really find it
>>> > very
>>> > unintuitive to realize, or need to wait until
>>> >
>>> > I get more time to investigate dedicated. :-)
>>> >
>>> >
>>> >
>>> > Regards,
>>> >
>>> > Christian.
>>> >
>>> > On Tue, Jun 12, 2012 at 12:52 PM, John Garbutt
>>> > <John.Garbutt@xxxxxxxxxx>
>>> > wrote:
>>> >
>>> > I thought Rabbit had a built in HA solution these days:
>>> >
>>> > http://www.rabbitmq.com/ha.html
>>> >
>>> >
>>> >
>>> > From: openstack-operators-bounces@xxxxxxxxxxxxxxxxxxx
>>> > [mailto:openstack-operators-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
>>> > Christian Parpart
>>> > Sent: 12 June 2012 09:59
>>> > To: openstack-operators@xxxxxxxxxxxxxxxxxxx
>>> > Subject: [Openstack-operators] Nova Controller HA issues
>>> >
>>> >
>>> >
>>> > Hi all,
>>> >
>>> >
>>> >
>>> > after spending the whole evening in making our cloud controller node
>>> > highly
>>> > available
>>> >
>>> > using Corosync/Pacemaker, at which I am really proud about it, I am
>>> > having
>>> > just a few
>>> >
>>> > problems left, and the one that freaks me out the most is
>>> > rabbitmq-server.
>>> >
>>> >
>>> >
>>> > That beast I just seem to find no good documenation on how to set
>>> > rabbitmq-server up
>>> >
>>> > properly for HA'ing.
>>> >
>>> >
>>> >
>>> > Does anyone have ever tried to set a nova controller (including
>>> > rabbitmq
>>> > dependency) up for HAing?
>>> >
>>> > If so, I'd be pleased to share experiences, especially to the latter
>>> > part.
>>> > :-)
>>> >
>>> >
>>> >
>>> > Best regards,
>>> >
>>> > Christian Parpart
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Openstack-operators mailing list
>>> > Openstack-operators@xxxxxxxxxxxxxxxxxxx
>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>> >
>>>
>>>
>>>
>>> --
>>> Igor Laskovy
>>> Kiev, Ukraine
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~openstack
>>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~openstack
>>> More help   : https://help.launchpad.net/ListHelp
>
>>
>>
>> _______________________________________________
>> Openstack-operators mailing list
>> Openstack-operators@xxxxxxxxxxxxxxxxxxx
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>
>
>
> --
> Igor Laskovy
> facebook.com/igor.laskovy
> Kiev, Ukraine

-- 
Igor Laskovy
facebook.com/igor.laskovy
Kiev, Ukraine
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html