Re: is it cool to restart servers as preventive maintenance?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I would absolutely recommend experimenting with higher max_lock, but I would also HIGHLY recommend checking all relevant logs when this is happening to you.

If restarting the app server is failing to connect, SOMEWHERE, SOMEONE should have logged SOMETHING.


-------- Original message --------
From: Kiriakos Georgiou <kg.postgresql@xxxxxxxxxxxxxx>
Date: 02/10/2016 3:48 PM (GMT-06:00)
To: pgsql-admin@xxxxxxxxxxxxxx
Subject: Re: is it cool to restart servers as preventive maintenance?


> On Feb 10, 2016, at 3:18 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>
> Kiriakos Georgiou <kg.postgresql@xxxxxxxxxxxxxx> writes:
>> In the last 12 months I have noticed 3-4 instances of database flakiness that is cured by restarting.
>> I’ve been using PostgreSQL since 2007 and I haven’t seen such issues requiring a reboot, but on my current project we do some rather heavy duty PostGIS analysis that apparently stresses the system enough to occasionally cause this (that’s my theory anyway.)  I’m beginning to seriously consider restarting servers on a monthly basis.
>
> What sort of "database flakiness"?
>
> It's possible you're encountering some kind of bug (memory leak?) in
> PostGIS, but that would be a bug you ought to get them to fix, not a
> reason why periodic restarts are a good idea.
>
>                        regards, tom lane
>



Flaky = the database appears to be running OK (I can run queries via psql) but our app is down for no apparent reason.  Restarting the app servers multiple times did not help.  Although the database seemed to respond fine to queries via psql, I decided to restart it.  That was a good move, our app worked fine after the database restart.

There is more to it: about 8 hours earlier our warm standby postgresql filled up the volume it puts the server logs by repeating the following two lines in the server log, millions of times:

WARNING:   out of shared memory
CONTEXT:   xlog redo AccessExclusive locks: xid 2002212 db 16384 rel 1079879

I had seen this on our primary about a year ago and I kept doubling max_locks_per_transaction all the way to 1024, at which point the problem did not reoccur (on the primary.)
I still occasionally (once every 3-4 months) see the “out of shared memory” message on the standby although max_locks_per_transaction has the same 1024 value as the primary.  When this happens I have to rebuilt it from the primary via pg_basebackup.  When the “out of shared memory” happens on the standby, it’s a coin toss whether the primary will behave flaky or not.  This time it did, and the restart fixed it.

Writing this email made me realize the likely cause of our problem.  It’s the “out of shared memory” issue.  We do have a plpgsql function that calls 30+ other plpgsql functions, some of which create temp tables.  Some of the calls are within loops.  So depending on data inputs we can get hundreds of locks to temp tables within a single transaction.  The odd thing is that at max_locks_per_transaction = 1024 we no longer get any “out of shared memory” on the primary, but we do on the stand by.  Any ideas about that?  Should I increase max_locks_per_transaction yet again?

thanks,
Kiriakos Georgiou

--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Journyx, Inc.
7600 Burnet Road #300
Austin, TX 78757
www.journyx.com

p 512.834.8888 
f 512-834-8858 

Do you receive our promotional emails? You can subscribe or unsubscribe to those emails at http://go.journyx.com/emailPreference/e/4932/714/ 

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux