Thank you so much for the input, and the detail
provided.
I'll be making the configuration changes,
probably over the course of the week, checking the affect after each (as
reminded by Scott Marlowe). I was pushed to put the new version into
production over the weekend, which at least may provide me with some
accurate feedback, and so will see what happens for a bit before addressing
the disk/drive layout.
-Midge
----- Original Message -----
Sent: Sunday, August 21, 2011 12:20
PM
Subject: Re: settings input for
upgrade
On 08/18/2011 05:55 PM, Midge Brown wrote:
DB1 is 10GB and consists of
multiple tables that I've spread out so that the 3 most used have their
data and indexes on 6 separate RAID1 drives, the 3 next busiest have data
& index on 3 drives, and the remaining tables and indexes are on the
RAID10 drive. The WAL for all is on a separate RAID1 drive.
DB2 is 25GB with data, index, and WAL all on
separate RAID1 drives.
DB3 is 15GB with data, index, and WAL on
separate RAID1 drives. Anytime you have a set of
disks and a set of databases/tables to lay out onto them, there are two main
options to consider:
-Put all of them into a single RAID10 array.
Performance will be high now matter what subset of the database is being
used. But if one particular part of a database is really busy, it can
divert resources away from the rest.
-Break the database into
fine-grained pieces and carefully lay out each of them on disk.
Performance of any individual chunk will be steady here. But if only a
subset of the data is being used, server resources will be idle. All of
the disks that don't have data related to that will be unused.
Consider
two configurations following these ideas:
1) 12 disks are placed into a
large RAID10 array. Peak transfer rate will be about 600MB/s on
sequential scans.
2) 6 RAID1 arrays are created and the database is
manually laid out onto those disks. Peak transfer rate from any one
section will be closer to 100MB/s.
Each of these is optimizing for a
different use scenario. Here's the best case for each:
-One user
is active, and they're hitting one of the database sections. In setup
(1) they might get 600MB/s, the case where it shows the most benefit. In
setup (2), they'd only get 100MB/s.
-10 users are pounding one section
of the database; 1 user is hitting a different section. In setup (2),
all 10 users will be fighting over access to one section of the disk, each
getting (at best) 10MB/s of its transfers. The nature of random I/O means that
it will likely be much worse for them. Meanwhile, the user hitting the
other database section will still be merrily chugging away getting their
100MB/s. Had setup (1) been used, you'd have 11 users fighting over
600MB/s, so at best 55MB/s for each. And with the random mix, it could
be much worse.
Which of these is better? Well, (1) is guaranteed
to use your hardware to its fullest capability. There are some
situations where contention over the disk array will cause performance to be
lower for some people, compared to if they had an isolated environment split
up more like (2). But the rest of the time, (2) will have taken a large
number of disks and left them idle. The second example shows this really
well. The mere fact that you have such a huge aggregate speed available
means that the big array really doesn't necessarily suffer that badly from a
heavy load. It has 6X as much capacity to handle them. You really
need to have a >6:1 misbalance in access before the carefully laid out
version pulls ahead. In every other case, the big array wins.
You
can defend (2) as the better choice if you have really compelling, hard data
proving use of the various parts of the data is split quite evenly among the
expected incoming workload. If you have response time latency targets
that require separating resources evenly among the various types of users, it
can also make sense there. I don't know if the data you've been
collecting from your older version is good enough to know that for sure or
not.
In every other case, you'd be better off just dumping the whole
pile into a single, large array, and letting the array and operating system
figure out how to schedule things best. That why this is the normal
practice for building PostgreSQL systems. The sole exception is that
splitting out the pg_xlog filesystem can usually be justified in a larger
array. The fact that it's always sequential I/O means that mixing its
work with the rest of the server doesn't work as well as giving it a dedicated
pair of drives to write to, where it doesn't ever stop to seek somewhere
else.
wal_buffers =
32MB This
might as well drop to 16MB. And you've already gotten some warnings
about work_mem. Switching to a connection pooler would help with that,
too.
autovacuum_analyze_threshold =
250
autovacuum_naptime =
10min
autovacuum_vacuum_threshold
= 250
vacuum_cost_delay =
10ms This strikes me as more
customization than you really should be doing to autovacuum, if you haven't
been running on a recent version of PostgreSQL yet. You shouldn't
ever need to touch the thresholds for example. Those only matter on
really small tables; once something gets big enough to really matter, the
threshold part is really small compared to the scale factor one. And the
defaults are picked partly so that cleanup of the system catalog tables is
done frequently enough. You're slowing that cleanup by moving the
thresholds upward so much, and that's not a great idea.
For
similar reasons, you really shouldn't be touching autovacuum_naptime unless
there's really good evidence it's necessary for your environment.
Changing things such
that regular vacuums executed at the command line happen with a cost delay
like this should be fine though. Those will happen using twice as many
resources as the autovacuum ones, but not run as fast as possible as in the
normal case.
deadlock_timeout =
3s You
probably don't want to increase this. When you reach the point where you
want to find slow lock issues by turning on log_lock_waits, you're just going
to put it right back to the default again--or lower it.
--
Greg Smith 2ndQuadrant US greg@xxxxxxxxxxxxxxx Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
|