Kevin Kempter wrote:
Hi List;
I've just inherited multiple postgres database servers in multiple data
centers across the US and Europe via a new contract I've just started.
Each night during the nightly batch processing several of the servers (2 in
particular) slow to a crawl - they are dedicated postgres database servers.
There is a lot of database activity going on sometimes upwards of 200
concurrent queries however I just dont think that the machines should be this
pegged. I am in the process of cleaning up dead space - their #1 fix for
performance issues in the past is to kill the current vacuum process.
Likewise I've just bumped shared_buffers to 150000 and work_mem to 250000.
Well, allowing vacuum to do its job can clearly only help matters. I'm
not sure about setting work_mem so high though. That's the memory you're
using per-sort, so you can use multiples of that in a single query. With
200 concurrent queries I'd worry about running into swap. If you're
doing it just for the batch processes that might make sense.
You might well want to set maintenance_work_mem quite high though, for
any overnight maintenance.
A shared_buffers of 1.2GB isn't outrageous, but again with 200 backend
processes you'll want to consider how much memory each process will
consume. It could be that you're better off with a smaller
shared_buffers and relying more on the OS doing its disk caching.
Even at that I still see slow processing/high system loads at nite.I have
noticed that killing the current vacuum process (autovacuum is turned on)
speeds up the entire machine significantly.
If it's disk i/o that's the limiting factor you might want to look at
the "Cost-Based Vacuum Delay" section in the configuration settings.
The servers are 4-CPU intel boxes (not dual-core) with 4Gig of memory and
attached to raid-10 array's
Any thoughts on where to start?
Make sure you are gathering stats and at least stats_block_level stuff.
Then have a cron-job make copies of the stats tables, but adding a
timestamp column. That way you can run diffs against different time periods.
Pair this up with top/vmstat/iostat activity.
Use log_min_duration_statement to catch any long-running queries so you
can see if you're getting bad plans that push activity up.
Try and make only one change at a time, otherwise it's difficult to tell
what's helping/hurting.
--
Richard Huxton
Archonet Ltd
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
message can get through to the mailing list cleanly