One of the most common causes I've seen for this is linux's vm.*dirty* settings to get in the way. Like so many linux kernel "optimizations" this one looks good on paper but gives at best middling improvements with occasional io storms that block everything else. On big mem machines doing a lot of writing IO I just set these to 0. Also tend to turn off swap as well as it's known to get in the way as well.
settings for /etc/sysctl.conf
vm.dirty_background_ratio = 0
vm.dirty_ratio = 0
On Tue, Apr 9, 2013 at 3:41 AM, Christian Hammers <ch@xxxxxxxxxxxx> wrote:
Hello
I have a setup with one master and two slaves which are used by a closed
source application. The database is asked the same query, a stored procedure,
with different parameters about 4 million times per second at a peak rate of
150 times per second using 10 parallel connections. The slaves are decent
Dell servers with 64GB RAM with dual hexacore CPUs and RAID10.
Usually this stored procedure takes <1ms as it basically just does two
selects against a GIST index for a prefix_range type. Seldomly though,
about 1-3 times per day, one of these queries takes up to 4000ms!
All those queries also runs in <1ms when executed manually some times later.
Queries with similar parameters (who I supposed to use the same area of the
index) also continues to run fast during that time. Queries with different
paramers which are running parallel on different threads take <1ms, too,
so it's not a general "load problem".
Cronjobs and other applications seem quiet during that time, there is
no peak in any of our monitoring graphs. Automatic vacuum/analyze log
entries on the master are not near the timestamps in question.
So my problem seems not the query itself nor the way I indexed my data
but what could it be? Some strange effects with streaming replication
or cache invalidation?
Apologies for not giving you reproducible problem but maybe you
still have some ideas as I'm just curious as I've never seem such an
effect during my MySQL years :-) The queries contain obvious customer
data so I'm reluctant to give examples but again I doubt that
an explain plan will help if only 1 out of 4E6 queries takes too long.
bye,
-christian-
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
To understand recursion, one must first understand recursion.