Re: how could select id=xx so slow?

Ants Aasma <ants@xxxxxxxxxxx> · Wed, 11 Jul 2012 14:59:35 +0300

On Wed, Jul 11, 2012 at 9:24 AM, Yan Chunlu <springrider@xxxxxxxxx> wrote:
> I have logged one day data and found the checkpoint is rather
> frequently(detail: https://gist.github.com/3088338). Not sure if it is
> normal, but the average time of checkpoint is about 100sec~200sec, it seems
> related with my settings:
>
> 574 checkpoint_segments = 64
> 575 wal_keep_segments = 5000
>
> I set checkpoint_segments as a very large value which is because otherwise
> the slave server always can not follow the master, should I lower that
> value?
>
> or the slow query is about something else?  thanks!

Some things to notice from the checkpoints log:
* All chcekpoints are triggered by checkpoint_timeout, using up only a
couple log files
* Checkpoints write out around 40MB of buffers
* The write out period is spread out nicely like it's supposed to but
the sync phase is occasionally taking a very long time (more than 2
minutes)

This looks like something (not necessarily the checkpoint sync itself)
is overloading the IO system. You might want to monitor the IO load
with iostat and correlate it with the checkpoints and slow queries to
find the culprit. It's also possible that something else is causing
the issues.

If the cause is checkpoints, just making them less frequent might make
the problem worse. I'm assuming you have 16GB+ of RAM because you have
4GB of shared_buffers. Just making checkpoint_timeout longer will
accumulate a larger number of dirty buffers that will clog up the IO
queues even worse. If you are on Linux, lowering
dirty_expire_centisecs or dirty_background_bytes might help to spread
the load out but will make overall throughput worse.

On the otherhand, if the I/O overload is from queries (more likely
because some checkpoints sync quickly) there are no easy tuning
answers. Making queries less IO intensive is probably the best you can
do. From the tuning side, newer Linux kernels handle I/O fairness a
lot better, and you could also try tweaking the I/O scheduler to
achieve better throughput to avoid congestion or at least provide
better latency for trivial queries. And of course its always possible
to throw more hardware at the problem and upgrade the I/O subsystem.

Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance