Re: is it cool to restart servers as preventive maintenance?

Scott Whitney <scott@xxxxxxxxxxx> · Wed, 10 Feb 2016 21:57:22 +0000

I would absolutely recommend experimenting with higher max_lock, but I would also HIGHLY recommend checking all relevant logs when this is happening to you.

If restarting the app server is failing to connect, SOMEWHERE, SOMEONE should have logged SOMETHING.

-------- Original message --------

From: Kiriakos Georgiou <kg.postgresql@xxxxxxxxxxxxxx> 

Date: 02/10/2016 3:48 PM (GMT-06:00) 

To: pgsql-admin@xxxxxxxxxxxxxx 

Subject: Re:  is it cool to restart servers as preventive maintenance? 

> On Feb 10, 2016, at 3:18 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

> 

> Kiriakos Georgiou <kg.postgresql@xxxxxxxxxxxxxx> writes:

>> In the last 12 months I have noticed 3-4 instances of database flakiness that is cured by restarting.

>> I’ve been using PostgreSQL since 2007 and I haven’t seen such issues requiring a reboot, but on my current project we do some rather heavy duty PostGIS analysis that apparently stresses the system enough to occasionally cause this (that’s my theory anyway.) 
 I’m beginning to seriously consider restarting servers on a monthly basis.

> 

> What sort of "database flakiness"?

> 

> It's possible you're encountering some kind of bug (memory leak?) in

> PostGIS, but that would be a bug you ought to get them to fix, not a

> reason why periodic restarts are a good idea.

> 

>                        regards, tom lane

> 

Flaky = the database appears to be running OK (I can run queries via psql) but our app is down for no apparent reason.  Restarting the app servers multiple times did not help.  Although the database seemed to respond fine to queries via psql, I decided to restart
 it.  That was a good move, our app worked fine after the database restart.

There is more to it: about 8 hours earlier our warm standby postgresql filled up the volume it puts the server logs by repeating the following two lines in the server log, millions of times:

WARNING:   out of shared memory

CONTEXT:   xlog redo AccessExclusive locks: xid 2002212 db 16384 rel 1079879

I had seen this on our primary about a year ago and I kept doubling max_locks_per_transaction all the way to 1024, at which point the problem did not reoccur (on the primary.)

I still occasionally (once every 3-4 months) see the “out of shared memory” message on the standby although max_locks_per_transaction has the same 1024 value as the primary.  When this happens I have to rebuilt it from the primary via pg_basebackup.  When the
 “out of shared memory” happens on the standby, it’s a coin toss whether the primary will behave flaky or not.  This time it did, and the restart fixed it.

Writing this email made me realize the likely cause of our problem.  It’s the “out of shared memory” issue.  We do have a plpgsql function that calls 30+ other plpgsql functions, some of which create temp tables.  Some of the calls are within loops.  So depending
 on data inputs we can get hundreds of locks to temp tables within a single transaction.  The odd thing is that at max_locks_per_transaction = 1024 we no longer get any “out of shared memory” on the primary, but we do on the stand by.  Any ideas about that? 
 Should I increase max_locks_per_transaction yet again?

thanks,

Kiriakos Georgiou

-- 

Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-admin

Journyx, Inc.
7600 Burnet Road #300 

Austin, TX 78757 

www.journyx.com 

p 512.834.8888 
f 512-834-8858 

Do you receive our promotional emails? You can subscribe or unsubscribe to those emails at http://go.journyx.com/emailPreference/e/4932/714/