Re: "show all" command crashes server * FIXED *

Grant Maxwell <grant.maxwell@xxxxxxxxxxxx> · Mon, 14 Sep 2009 09:32:30 +1000

First of all thanks to those who provided input.
This problem is now fixed and I thought I would post this solution so that others might benefit in the future.

For the sake of completeness:

	The error was that if "show all" was run on this postgresql (version 8.3) server, postgres would crash and then recover.
	Otherwise the server "seemed" healthy

	The postgres log showed:
Sep 10 23:55:36 theconsole postgres[31118]: [4-1]     0: LOG:  00000: server process (PID 31145) was terminated by signal 11: Segmentation fault
Sep 10 23:55:36 theconsole postgres[31118]: [4-2]     0: LOCATION:  LogChildExit, postmaster.c:2529
Sep 10 23:55:36 theconsole postgres[31118]: [5-1]     0: LOG:  00000: terminating any other active server processes
Sep 10 23:55:36 theconsole postgres[31118]: [5-2]     0: LOCATION:  HandleChildCrash, postmaster.c:2374
Sep 10 23:55:36 theconsole postgres[31118]: [6-1]     0: LOG:  00000: all server processes terminated; reinitializing
Sep 10 23:55:36 theconsole postgres[31118]: [6-2]     0: LOCATION:  PostmasterStateMachine, postmaster.c:2690
Sep 10 23:55:36 theconsole postgres[31146]: [7-1]     0: LOG:  00000: database system was interrupted; last known up at 2009-09-10 23:55:14 EST
Sep 10 23:55:36 theconsole postgres[31146]: [7-2]     0: LOCATION:  StartupXLOG, xlog.c:4836
Sep 10 23:55:36 theconsole postgres[31147]: [7-1]  [local] postgres postgres 0: FATAL:  57P03: the database system is in recovery mode
Sep 10 23:55:36 theconsole postgres[31147]: [7-2]  [local] postgres postgres 0: LOCATION:  ProcessStartupPacket, postmaster.c:1648
Sep 10 23:55:36 theconsole postgres[31146]: [8-1]     0: LOG:  00000: database system was not properly shut down; automatic recovery in progress
Sep 10 23:55:36 theconsole postgres[31146]: [8-2]     0: LOCATION:  StartupXLOG, xlog.c:5003
Sep 10 23:55:36 theconsole postgres[31146]: [9-1]     0: LOG:  00000: record with zero length at 2A/E734761C
Sep 10 23:55:36 theconsole postgres[31146]: [9-2]     0: LOCATION:  ReadRecord, xlog.c:3126
Sep 10 23:55:36 theconsole postgres[31146]: [10-1]     0: LOG:  00000: redo is not required
Sep 10 23:55:36 theconsole postgres[31146]: [10-2]     0: LOCATION:  StartupXLOG, xlog.c:5146
Sep 10 23:55:36 theconsole postgres[31150]: [7-1]     0: LOG:  00000: autovacuum launcher started
Sep 10 23:55:36 theconsole postgres[31150]: [7-2]     0: LOCATION:  AutoVacLauncherMain, autovacuum.c:520
Sep 10 23:55:36 theconsole postgres[31118]: [7-1]     0: LOG:  00000: database system is ready to accept connections

SOLUTION:
	Increase the memory on the server.

WHY
	We had recently ( a month before) had installed splunk on the server. It was running ok
	The combination of splunk and other tasks running had pushed the memory too close.
	What we did not notice was that swap had been almost completely consumed - nasty

RESULT
	We shut it all down, increased the memory (double) and voila - problem gone.

It goes to show that when hunting problems we should not ignore the basic environmental elements.
It also goes to show that our monitoring system was not looking at this relatively new server.
(this confession is not an invitation for a spanking)

again thanks for the help
Grant

On 11/09/2009, at 9:09 AM, Grant Maxwell wrote:

On 11/09/2009, at 8:36 AM, Tom Lane wrote:

Grant Maxwell <grant.maxwell@xxxxxxxxxxxx> writes:
On the problem server:
	shared_preload_libraries = 'pgmemcache'
	#local_preload_libraries = ''

on the others both are emply.

Sounds like a smoking gun to me.

For good measure I removed pgmemcache but the problem persists.

Did you restart the postmaster afterwards?  shared_preload_libraries
is only considered at postmaster start.

	yep - full restart.

			regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: "show all" command crashes server *** FIXED ***

Re: "show all" command crashes server * FIXED *