On Thu, Sep 8, 2016 at 4:30 AM, Moreno Andreo <moreno.andreo@xxxxxxxxxx> wrote:
Hi folks! :-)
This morning I was woken up by a call of a coworker screaming "Help, our Postgres server is throwing strange errors!"
Not the best way to start your day...
OK, to the serious part.
"Strange errors" were (in postgresql-9.1-main.log)
WARNING: out of shared memory
ERROR: out of shared memory
HINT: you may need to increase max_locks_per_transaction
Restarting Postgresql solved the issue (for now), but that's what I'm wondering:
- the greatest part of this locks are used by rubyrep (that we use to replicate users' databases), no new users since 3 weeks, first time error show up in almost 2 years
- I read this: https://wiki.postgresql.org/wiki/Lock_Monitoring but still I can't figure out what to do if I need to know if I have to be worried or not :-)
- I have
OS: Ubuntu 12.04 (won't upgrade because we are leaving this server to a new one with Debian Jessie)
PG: 9.1.6 (same as above, in new server ve have 9.5.4)
You have a problem now. Upgrading PG now might help you solve the problem you have now. Why would you choose to work with one hand tied behind your back now, just because you were already planning on upgrading later? Also, moving both the OS and the PG version at the same time is great if everything goes well. But if everything doesn't go well, you have greatly increased the scope of the problem-solving by making two changes at once. If it were me (and my employer gave me the freedom to do my job effectively), I'd rather spend my time bringing forward the date on which I upgrade PG, rather than spend that time tracking down problems that occur, or at least are hard to track down, because I am running an obsolete PG. Just double max_locks_per_transaction (with 32GB of ram, that should not be a problem) and call it good until after the upgrade.
RAM: 32 GB
shared_buffers = 2GB
max_connections=800
max_locks_per_transaction=64 (default value)
max_prepared_transactions = 0
so, I should be able to manage 800*64 = 5120 locks, right?
Now my pg_locks table has more than 6200 rows, but if I reorder them by pid I see that one of them has 5800 of them, and it keeps on eating locks.
If I dig more and get pid info, its state is "<IDLE> in transaction"
On PG9.2 or above, you would be able to see the most recently run statement, in addition to the state. That could help a lot in figuring out how it doing this lock-fest (of course you can already get the client_hostname and the application_name, which could also be a big help).
What is the distribution of locks by type?
select locktype, count(*) from pg_locks group by 1;
ATM there are no locks that have granted = false.
Now, question time:
- Is there a number of pg_locks rows to be worried about? At more than 6000 I'm still not facing out of shared memory again
I don't think that the exact number of locks that will fit in shared memory is predictable. For example, if different processes co-hold sharable locks on the same list of tables, it seems to take up slightly more memory than if each process was locking a different list tables. And different lock types also take different amounts of memory. And memory fragmentation might also cause changes in capacity that are hard to predict--I can run the same parallel program repeated, and have it sometimes run out of memory and sometimes not.
- Is there a way to release locks of that pid without pg_terminate() it?
I'm afraid not. I believe your two options are pg_terminate_backend or things which are functionally equivalent to that; or to identify who is doing this (see application_name and client_hostname) and haranguing them until they stop doing it.
Cheers,
Jeff