Peter Eisentraut <peter_e@xxxxxxx> writes: > I have observed the following situation a few times now (weeks or months > apart), most recently with 8.3.7. Some postgres child process crashes. > The postmaster notices and sends SIGQUIT to all other children. Once > all other children have exited, it would enter recovery. But for some > reason, some children are not processing the SIGQUIT signal and are > basically just stuck. That means the whole database system is then > stuck and won't continue without manual intervention. If I go in > manually and SIGKILL the offending processes, everything proceeds > normally, recovery finishes, and the system is up again. We need some investigation into why that is happening. > I haven't had the chance yet to analyze why the SIGQUIT signals are > getting stuck. Be that as it may, it appears there are no provisions > for this case. I couldn't find any documentation or previous reports on > this sort of thing. One might imagine a feature where the postmaster > resorts to throwing SIGKILLs around after a while, similar to how init > scripts are sometimes set up. I'd prefer not to go there, at least not without a demonstration that this will solve a bug that's unsolvable otherwise. If a child is really stuck in a state that doesn't accept SIGQUIT, it probably won't accept SIGKILL either (eg, uninterruptable disk wait). Or maybe we just have some errant code that is blocking SIGQUIT; but that's a garden variety bug IMO, not something that needs major new postmaster logic to work around. regards, tom lane -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin