Re: [ovirt-users] Hosted-Engine HA problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote:
Hi guys,

these logs appear on both hosts just like the result of --vm-status. tried to tcpdump on ovirt hosts and gluster nodes but only packets exchange with my monitoring VM(zabbix) appeared.

agent.log
     new_data = self.refresh(self._state.data)
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 77, in refresh
     stats.update(self.hosted_engine.collect_stats())
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 662, in collect_stats
     constants.SERVICE_TYPE)
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 171, in get_stats_from_storage
     result = self._checked_communicate(request)
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 199, in _checked_communicate
     .format(message or response))
RequestError: Request failed: <type 'exceptions.OSError'>

broker.log
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 165, in handle
     response = "success " + self._dispatch(data)
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 261, in _dispatch
     .get_all_stats_for_service_type(**options)
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 41, in get_all_stats_for_service_type
     d = self.get_raw_stats_for_service_type(storage_dir, service_type)
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 74, in get_raw_stats_for_service_type
     f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'

- ah, there we go ^^^^^^ you might need to tweak the limit of allowed open files as described here [1] or find the app keeps so many files open


--Jirka

[1] http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/

Thread-38160::INFO::2014-10-31 10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed
Thread-38161::INFO::2014-10-31 10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established
Thread-38161::ERROR::2014-10-31 10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'get-stats storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent service_type=hosted-engine'
Traceback (most recent call last):
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 165, in handle
     response = "success " + self._dispatch(data)
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 261, in _dispatch
     .get_all_stats_for_service_type(**options)
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 41, in get_all_stats_for_service_type
     d = self.get_raw_stats_for_service_type(storage_dir, service_type)
   File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 74, in get_raw_stats_for_service_type
     f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
Thread-38161::INFO::2014-10-31 10:28:53,658::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed

Thanks,
Jaicel

----- Original Message -----
From: "Niels de Vos" <ndevos@xxxxxxxxxx>
To: "Vijay Bellur" <vbellur@xxxxxxxxxx>
Cc: "Jiri Moskovcak" <jmoskovc@xxxxxxxxxx>, "Jaicel R. Sabonsolin" <jaicel@xxxxxxxxxxxxxxxx>, users@xxxxxxxxx, "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
Sent: Friday, October 31, 2014 4:11:25 AM
Subject: Re: [ovirt-users] Hosted-Engine HA problem

On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote:
On 10/30/2014 06:45 PM, Jiri Moskovcak wrote:
On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote:
Hi Guys,

I need help with my ovirt Hosted-Engine HA setup. I am running on 2
ovirt hosts and 2 gluster nodes with replicated volumes. i already have
VMs running on my hosts and they can migrate normally once i for example
power off the host that they are running on. the problem is that the
engine can't migrate once i switch off the host that hosts the engine.

    oVirt        3.4.3-1.el6
    KVM         0.12.1.2 - 2.415.el6_5.10
    LIBVIRT   libvirt-0.10.2-29.el6_5.9
    VDSM      vdsm-4.14.17-0.el6


right now, i have this result from hosted-engine --vm-status.

       File "/usr/lib64/python2.6/runpy.py", line 122, in
    _run_module_as_main
         "__main__", fname, loader, pkg_name)
       File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
         exec code in run_globals
       File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",

    line 111, in <module>
         if not status_checker.print_status():
       File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",

    line 58, in print_status
         all_host_stats = ha_cli.get_all_host_stats()
       File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",

    line 137, in get_all_host_stats
         return self.get_all_stats(self.StatModes.HOST)
       File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",

    line 86, in get_all_stats
         constants.SERVICE_TYPE)
       File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",

    line 171, in get_stats_from_storage
         result = self._checked_communicate(request)
       File

"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",

    line 199, in _checked_communicate
         .format(message or response))
    ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed:
    <type 'exceptions.OSError'>


restarting ha-broker and ha-agent normalizes the status but eventually
it would become "false" and then return to the result above. hope you
guys could help me with this.


Hi Jaicel,
please attach agent.log and broker.log from the host where you trying to
run hosted-engine --vm-status. I have a feeling that you ran into a
known problem on gluster - stalled file descriptor, in that case the
only known solution at this time is to restart the broker & agent as you
have already found out.


Adding Niels and gluster-devel to troubleshoot from Gluster NFS perspective.

I'd welcome any details on this "stalled file descriptor" problem. Is
there a bug filed with some details like logs, sysrq-t and maybe even
tcpdumps? If there is an easy way to reproduce this behaviour, I can
surely look into it and hopefully come up with some advise or fix.

Thanks,
Niels


_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux