Keith,
Thanks so much for your response. I just sent a post that has more information on my indexes and how I select data.
I am glad you mentioned data retention. Each day early in the morning I delete data from this table that is older than 18 months. I now see that I should probably vacuum the table after deleting the data. That way any new information that is added is all grouped together.
Your comments were very insightful.
Thanks,
Lance
From: Keith Fiske <keith.fiske@xxxxxxxxxxxxxxx>
Date: Tuesday, October 9, 2018 at 9:35 AM
To: Lance Campbell <lance@xxxxxxxxxxxx>
Cc: "pgsql-admin@xxxxxxxxxxxxxx" <pgsql-admin@xxxxxxxxxxxxxx>
Subject: Re: Should I use table partitions
On Tue, Oct 9, 2018 at 10:19 AM Campbell, Lance <lance@xxxxxxxxxxxx> wrote:
PostgreSQL 10.x
Question:
Should I use table partitions if my table meets the below parameters? The size of the table should remain relatively consistent in size and number of rows. I would not expect it to vary more than 15% at any given time.
Info:
Table rows: 83,051,164
Table size only: 25.6 G
Total size of table including indexes: 38.6 G
Table definition:
CREATE TABLE xyz.abc
(
fk_def_id integer NOT NULL,
id integer NOT NULL DEFAULT nextval(('xyz.abc_id_seq'::text)::regclass),
email_address text,
created_timestamp timestamp with time zone DEFAULT now(),
updated_timestamp timestamp with time zone DEFAULT now(),
provider text,
server_id integer DEFAULT 0,
read_count integer DEFAULT 0,
fk_group_id integer DEFAULT 0,
mail_log_timestamp timestamp with time zone DEFAULT now(),
mail_log_process_code integer DEFAULT '-1'::integer,
mail_log_message text DEFAULT,
mail_log_mail_id text DEFAULT '',
mail_log_rule text DEFAULT '',
is_listserv boolean DEFAULT false,
is_bad_email_address boolean DEFAULT false,
bad_email_address_message text DEFAULT '',
finder text DEFAULT '',
constituent_id text DEFAULT '',
CONSTRAINT abc_pkey PRIMARY KEY (id)
);
Thanks,
Lance Campbell
The big, main reason for partitioning in PostgreSQL is if you have data retention that has to be managed. It makes having to deal with bloat a lot easier since you don't have to do bulk deletes that may never be filled with new data again. If you have no data retention being done now, I wouldn't worry about it until you start getting closer to 1 billion rows or more. That's when vacuuming run times may start to be a concern and partitioning can help reduce those since the individual objects are smaller to manage. Also, if there's a lot of static data, and you're on 9.6+, you can greatly improve vacuum times by running a VACUUM FREEZE on the table. That added a feature to allow vacuum to skip pages that are fully frozen.
Query read performance is really a secondary concern for partitioning in PG, especially if you're not managing data retention. Btree lookups are FAR more efficient than constraint exclusion (partition pruning in PG11+) will ever be. And if you're only doing more narrow data lookups, you'll actually see a performance decrease in queries since removing partitions in the planner adds overhead. That overhead isn't as big of a concern if there's data retention to manage, though, since bloat will be a bigger impact than the planner overhead. If you're grabbing larger blocks of data at one time, that's when partitioning may start helping with reads more since it has to pull fewer pages from disk.
--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com
On Tue, Oct 9, 2018 at 10:43 AM Campbell, Lance <lance@xxxxxxxxxxxx> wrote:
Note that a normal VACUUM will not re-arrange the data on disk to group it together. Only a VACUUM FULL will do something like that and even then it doesn't really group similar data close together. You're probably thinking of CLUSTER. And the clustering is not automatically retained and has to be re-run periodically. But both VACUUM FULL and CLUSTER require a full lock on the table involved that blocks all access until they are done running.
I wrote more about managing bloat in my blog and what vacuuming actually does if you're interested - https://www.keithf4.com/checking-for-postgresql-bloat/
Before going down the partitioning road, I would first recommend getting some bloat monitoring in place, to see if it's even a concern. If you're removing the data fairly often and not in huge chunks at a time, your deletion/ingest rate may be balanced out. If you are seeing bloat becoming a concern, I'd recommend the smallest interval at monthly. You may even want to try yearly since your retention period is longer. The fewer partitions there are, the better.
Keith