All fair questions...
Thank you for your detailed response!
On 1/4/2013 11:03 PM, Jeff Janes wrote:
On Friday, January 4, 2013, AJ Weber wrote:
Hi all,
I have a table that has about 73mm rows in it and growing.
How big is the table in MB? Its indexes?
Not sure on this. Will see if pgAdmin tells me.
...
The server has 12GB RAM, 4 cores, but is shared with a big
webapp running in Tomcat -- and I only have a RAID1 disk to work
on. Woes me...
By a RAID1 disk, do you mean two disks in a RAID1
configuration, or a single RAID1 composed of an unspecified
number of disks?
Often spending many thousands of dollars in DBA time can save
you from having to buy many hundreds of dollars in hard drives.
:) On the other hand, often you end up having to buy the extra
disks anyway afterall.
I mean I have two disks in a RAID1 configuration. The server is
currently in a whitebox datacenter and I have zero control over the
hardware, so adding disks is unfortunately out of the question. I
completely understand the comment, and would love to have a larger
SAN available to me that I could configure...I just don't and have
no way of getting one anytime soon.
Anyway, this table is going to continue to grow, and it's used
frequently (Read and Write).
Are all rows in the table read and written with equal vigor,
or are there hot rows and cold rows that can be recognized based
on the row's values?
No, I could probably figure out a way to setup an "archive" or
"older" section of the data that is updated much less frequently.
Deletes are rare. Inserts/Updates "yes". Select on existing rows
-- very frequent.
From what I
read, this table is a candidate to be partitioned for
performance and scalability. I have tested some scripts to
build the "inherits" tables with their constraints and the
trigger/function to perform the work.
Am I doing the right thing by partitioning this?
Probably not. Or at least, you haven't given us the
information to know. Very broadly speaking, well-implemented
partitioning makes bulk loading and removal operations take less
IO, but makes normal operations take more IO, or if lucky
leaves it unchanged. There are exceptions, but unless you can
identify a very specific reason to think you might have one of
those exceptions, then you probably don't.
I know you can't believe everything you read, but I thought I saw
some metrics about when a table's size exceeds some fraction of
available RAM, or when it approaches 100mm rows, it's a big
candidate for partitioning.
Do you have a natural partitioning key? That is, is there a
column (or _expression_) which occurs as a selective component in
the where clause of almost all of your most io consuming SQL and
DML? If so, you might benefit from partitioning on it. (But in
that case, you might be able to get most of the benefits of
partitioning, without the headaches of it, just by revamping
your indexes to include that column/_expression_ as their leading
field).
If you don't have a good candidate partitioning key, then
partitioning will almost surely make things worse.
The table is a "detail table" to its master records. That is, it's
like an order-details table where it will have a 1-n rows joined to
the master ("order") table on the order-id. So I can partition it
based on the order number pretty easily (which is a bigint, btw).
If so, and I can afford some downtime, is dumping the table via
pg_dump and then loading it back in the best way to do this?
To do efficient bulk loading into a partitioned table, you
need to specifically target each partition, rather than
targeting with a trigger. That pretty much rules out pg_dump,
AFAIK, unless you are going to parse the dump file(s) and
rewrite them.
Should I run a cluster or vacuum full after all is done?
Probably not. If a cluster after the partitioning would be
beneficial, there would be a pretty good chance you could do a
cluster *instead* of the partitioning and get the same benefit.
I did try clustering the table on the PK (which is actually 4
columns), and it appeared to help a bit. I was hoping partitioning
was going to help me even more.
If you do some massive deletes from the parent table as part
of populating the children, then a vacuum full of the parent
could be useful. But if you dump the parent table, truncate it,
and reload it as partitioned tables, then vacuum full would
probably not be useful.
Really, you need to identify your most resource-intensive
queries before you can make any reasonable decisions.
Is there a major benefit if I can upgrade to 9.2.x in some way
that I haven't realized?
If you have specific queries that are misoptimized and so are
generating more IO than they need to, then upgrading could help.
On the other hand, it could also make things worse, if a
currently well optimized query becomes worse.
Is there some new feature or optimization you're thinking about with
this comment? If so, could you please just send me a link and/or
feature name and I'll google it myself?
But, instrumentation has improved in 9.2 from 9.0, so
upgrading would make it easier to figure out just which queries
are really bad and have the most opportunity for improvement. A
little well informed optimization might obviate the need for
either partitioning or more hard drives.
This is interesting too. I obviously would like the best available
options to tune the database and the application. Is this detailed
in the release notes somewhere, and what tools could I use to take
advantage of this? (Are there new/improved details included in the
EXPLAIN statement or something?)
Finally, if anyone has any comments about my settings listed
above that might help improve performance, I thank you in
advance.
Your default statistics target seemed low. Without knowing
the nature of your most resource intensive queries or how much
memory tomcat is using, it is hard to say more.
Tomcat uses 4G of RAM, plus we have nginx in front using a little
and some other, smaller services running on the server in addition
to the usual Linux gamut of processes.
Cheers,
Jeff
|