Disclaimer: My comments here are generic to Windows services. I don't run Postgresql on Windows and I have no idea how it is implemented. On Sun, 1 May 2016 03:35:44 +0100, Tom Hodder <tom@xxxxxxxxxxxxxxxx> wrote: >I've got several machines running windows 7 which have postgresql 9.4 >installed as a service, and configured to start automatically on boot. I am >monitoring these services with zabbix and several times a week I get a >notification that the postgresql-x64-9.4 service has stopped. > >When I login to the machine, the service does appear to be stopped; >? >However when I check the database, I can query it ok; Windows services have a time limit to respond to commands or status inquries. The service manager periodically queries status of all running services - if they don't respond quickly enough, the manager thinks they are hosed. That may or may not be true. But IME unresponsive services rarely appear "stopped" - usually they show as "started" in the service manager, or, if you run SC from the command line their state is shown as "running". >If I try to start the service from the service manager, I see the following >error in the logs; > >*2016-04-30 05:03:13 BST FATAL: lock file "postmaster.pid" already >exists2016-04-30 05:03:13 BST HINT: Is another postmaster (PID 2556) >running in data directory "C:/Program Files/PostgreSQL/9.4/data"?* > >The pg_ctl tool seems to correctly query the state of the service and >return the correct PID; > >*C:\Program Files\PostgreSQL\9.4>bin\pg_ctl.exe -D "C:\Program >Files\PostgreSQL\9.4\data" status >pg_ctl: server is running (PID: 2556**)* Which suggest the service either is not reponding to the manager's status inquiries, or is responding too late. >The other thing that seems to happen is the pgadmin3 tool seems to >have lost the ability to control the service as all the options for >start/stop are greyed out; >[image: Inline images 2] This is likely because the service manager believes the service is unresponsive. The programming API communicates with the manager. >The only option to get the control back is to kill the processes in >the task manager or reboot the machine. You could try "sc stop <service>" from the command line. The SC tool is separate from the shell "net" command and it sometimes will work when "net stop <service>" does not. You also could try using recovery options in the service manager to automatically restart the service. But if the service is showing as "stopped" when it really is running, this is unlikely to work. >Any suggestions on what might be causing this? Services are tricky to get right: there are a number of rules the control interface has to obey that are at odds with doing real work. A single threaded service must periodically send "busy" status to the manager during lengthy processing. Failure to do that in a timely manner will cause problems. A multi-threaded service that separates processing from control must be able to suspend or halt the processing when directed and send "busy" status if it can't. There is a way to launch arbirtrary programs as services so they can run at startup and in the background, but programs that weren't written explicitly to BE services don't obey the service manager and their diplayed status usually is bogus (provided by the launcher). George -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general