Hello, We are seeing an occasional problem where restarts of funcd on the minions are not successful and the func daemon is stopped but not able to start again. Checking func.log gives: 2011-10-02 04:02:04,321 - INFO - Exception occured: socket.error 2011-10-02 04:02:04,321 - INFO - Exception value: (98, 'Address already in use') 2011-10-02 04:02:04,322 - INFO - Exception Info: File "/usr/bin/funcd", line 23, in ? server.main(sys.argv) File "/usr/lib/python2.4/site-packages/func/minion/server.py", line 413, in main serve() File "/usr/lib/python2.4/site-packages/func/minion/server.py", line 225, in serve server = setup_server() File "/usr/lib/python2.4/site-packages/func/minion/server.py", line 220, in setup_server server = FuncSSLXMLRPCServer((listen_addr, listen_port), config.module_list) File "/usr/lib/python2.4/site-packages/func/minion/server.py", line 279, in __init__ self.ca) File "/usr/lib/python2.4/site-packages/func/minion/AuthedXMLRPCServer.py", line 74, in __init__ SimpleXMLRPCServer.SimpleXMLRPCServer.__init__(self, address, AuthedSimpleXMLRPCRequestHandler) File "/usr/lib64/python2.4/SimpleXMLRPCServer.py", line 473, in __init__ SocketServer.TCPServer.__init__(self, addr, requestHandler) File "/usr/lib64/python2.4/SocketServer.py", line 330, in __init__ self.server_bind() File "/usr/lib64/python2.4/SocketServer.py", line 341, in server_bind self.socket.bind(self.server_address) File "<string>", line 1, in bind As you may guess from the timestamp we are seeing this problem most often at 4:02am on Sundays, i.e. as part of the logrotate of func logs. Logging in to the server and starting the func service once we spot it is stopped has always worked so far without needing manual removal of any pid or lock file. One theory is that this problem occurred when the func minion was processing a command and told to restart part way through. From watching netstat, it looks like the func daemon stops listening on the minion port to allow the spawned process to communicate with the master. If the daemon stops, the spawned process blocks a new daemon from starting ('Address already in use') but that spawned process then exits and we're left with no daemons. Does this ring any bells with anyone? Is this a known bug? We've already added monit to mop up after this, but it'd be much preferable to find a proper fix. Alison |
_______________________________________________ Func-list mailing list Func-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/func-list