Hi guys, these logs appear on both hosts just like the result of --vm-status. tried to tcpdump on ovirt hosts and gluster nodes but only packets exchange with my monitoring VM(zabbix) appeared. agent.log new_data = self.refresh(self._state.data) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 662, in collect_stats constants.SERVICE_TYPE) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 171, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 199, in _checked_communicate .format(message or response)) RequestError: Request failed: <type 'exceptions.OSError'> broker.log File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 165, in handle response = "success " + self._dispatch(data) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 261, in _dispatch .get_all_stats_for_service_type(**options) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 41, in get_all_stats_for_service_type d = self.get_raw_stats_for_service_type(storage_dir, service_type) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 74, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY) OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata' Thread-38160::INFO::2014-10-31 10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-38161::INFO::2014-10-31 10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-38161::ERROR::2014-10-31 10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'get-stats storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent service_type=hosted-engine' Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 165, in handle response = "success " + self._dispatch(data) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 261, in _dispatch .get_all_stats_for_service_type(**options) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 41, in get_all_stats_for_service_type d = self.get_raw_stats_for_service_type(storage_dir, service_type) File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 74, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY) OSError: [Errno 24] Too many open files: '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata' Thread-38161::INFO::2014-10-31 10:28:53,658::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thanks, Jaicel ----- Original Message ----- From: "Niels de Vos" <ndevos@xxxxxxxxxx> To: "Vijay Bellur" <vbellur@xxxxxxxxxx> Cc: "Jiri Moskovcak" <jmoskovc@xxxxxxxxxx>, "Jaicel R. Sabonsolin" <jaicel@xxxxxxxxxxxxxxxx>, users@xxxxxxxxx, "Gluster Devel" <gluster-devel@xxxxxxxxxxx> Sent: Friday, October 31, 2014 4:11:25 AM Subject: Re: [ovirt-users] Hosted-Engine HA problem On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote: > On 10/30/2014 06:45 PM, Jiri Moskovcak wrote: > >On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote: > >>Hi Guys, > >> > >>I need help with my ovirt Hosted-Engine HA setup. I am running on 2 > >>ovirt hosts and 2 gluster nodes with replicated volumes. i already have > >>VMs running on my hosts and they can migrate normally once i for example > >>power off the host that they are running on. the problem is that the > >>engine can't migrate once i switch off the host that hosts the engine. > >> > >> oVirt 3.4.3-1.el6 > >> KVM 0.12.1.2 - 2.415.el6_5.10 > >> LIBVIRT libvirt-0.10.2-29.el6_5.9 > >> VDSM vdsm-4.14.17-0.el6 > >> > >> > >>right now, i have this result from hosted-engine --vm-status. > >> > >> File "/usr/lib64/python2.6/runpy.py", line 122, in > >> _run_module_as_main > >> "__main__", fname, loader, pkg_name) > >> File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code > >> exec code in run_globals > >> File > >> > >>"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", > >> > >> line 111, in <module> > >> if not status_checker.print_status(): > >> File > >> > >>"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", > >> > >> line 58, in print_status > >> all_host_stats = ha_cli.get_all_host_stats() > >> File > >> > >>"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", > >> > >> line 137, in get_all_host_stats > >> return self.get_all_stats(self.StatModes.HOST) > >> File > >> > >>"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", > >> > >> line 86, in get_all_stats > >> constants.SERVICE_TYPE) > >> File > >> > >>"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > >> > >> line 171, in get_stats_from_storage > >> result = self._checked_communicate(request) > >> File > >> > >>"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > >> > >> line 199, in _checked_communicate > >> .format(message or response)) > >> ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: > >> <type 'exceptions.OSError'> > >> > >> > >>restarting ha-broker and ha-agent normalizes the status but eventually > >>it would become "false" and then return to the result above. hope you > >>guys could help me with this. > >> > > > >Hi Jaicel, > >please attach agent.log and broker.log from the host where you trying to > >run hosted-engine --vm-status. I have a feeling that you ran into a > >known problem on gluster - stalled file descriptor, in that case the > >only known solution at this time is to restart the broker & agent as you > >have already found out. > > > > Adding Niels and gluster-devel to troubleshoot from Gluster NFS perspective. I'd welcome any details on this "stalled file descriptor" problem. Is there a bug filed with some details like logs, sysrq-t and maybe even tcpdumps? If there is an easy way to reproduce this behaviour, I can surely look into it and hopefully come up with some advise or fix. Thanks, Niels _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel