Re: funcd not restarting - "Address already in use"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Make sure PyOpenSSL's version. UPGRADE pyOpenSSL of all machines' to pyOpenSSL-0.6-2.el5.
Check the hosts. Make sure all machine can resolve each other.

On Thu, Oct 20, 2011 at 12:07 AM, Alison Young <alison.young@xxxxxxxxxxx> wrote:
Hello,

We are seeing an occasional problem where restarts of funcd on the minions are not successful and the func daemon is stopped but not able to start again.

Checking func.log gives:

2011-10-02 04:02:04,321 - INFO - Exception occured: socket.error
2011-10-02 04:02:04,321 - INFO - Exception value: (98, 'Address already in use')
2011-10-02 04:02:04,322 - INFO - Exception Info:
  File "/usr/bin/funcd", line 23, in ?
    server.main(sys.argv)
   File "/usr/lib/python2.4/site-packages/func/minion/server.py", line 413, in main
    serve()
   File "/usr/lib/python2.4/site-packages/func/minion/server.py", line 225, in serve
    server = setup_server()
   File "/usr/lib/python2.4/site-packages/func/minion/server.py", line 220, in setup_server
    server = FuncSSLXMLRPCServer((listen_addr, listen_port), config.module_list)
   File "/usr/lib/python2.4/site-packages/func/minion/server.py", line 279, in __init__
    self.ca)
   File "/usr/lib/python2.4/site-packages/func/minion/AuthedXMLRPCServer.py", line 74, in __init__
    SimpleXMLRPCServer.SimpleXMLRPCServer.__init__(self, address, AuthedSimpleXMLRPCRequestHandler)
   File "/usr/lib64/python2.4/SimpleXMLRPCServer.py", line 473, in __init__
    SocketServer.TCPServer.__init__(self, addr, requestHandler)
   File "/usr/lib64/python2.4/SocketServer.py", line 330, in __init__
    self.server_bind()
   File "/usr/lib64/python2.4/SocketServer.py", line 341, in server_bind
    self.socket.bind(self.server_address)
   File "<string>", line 1, in bind


As you may guess from the timestamp we are seeing this problem most often at 4:02am on Sundays, i.e. as part of the logrotate of func logs. Logging in to the server and starting the func service once we spot it is stopped has always worked so far without needing manual removal of any pid or lock file.

One theory is that this problem occurred when the func minion was processing a command and told to restart part way through. From watching netstat, it looks like the func daemon stops listening on the minion port to allow the spawned process to communicate with the master. If the daemon stops, the spawned process blocks a new daemon from starting ('Address already in use') but that spawned process then exits and we're left with no daemons.

Does this ring any bells with anyone? Is this a known bug?

We've already added monit to mop up after this, but it'd be much preferable to find a proper fix.

Alison

_______________________________________________
Func-list mailing list
Func-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/func-list



--
--------------------------

马新成 | Jackie Ma

MSN: jacknet.ma@hotmail.com   QQ: 2252339967
Twitter: @JackieMa2   G+:  Jackie Ma
My_web: http://jackiema.blog.chinaunix.net 

              http://cn.linkedin.com/in/jacknet

使IT运维简单,方便,智能,提高运维效率,节省人力

_______________________________________________
Func-list mailing list
Func-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/func-list

[Index of Archives]     [Fedora Users]     [Linux Networking]     [Fedora Legacy List]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]

  Powered by Linux