Re: Should I use table partitions

"Campbell, Lance" <lance@xxxxxxxxxxxx> · Tue, 9 Oct 2018 14:43:37 +0000

Keith,
Thanks so much for your response.  I just sent a post that has more information on my indexes and how I select data.

I am glad you mentioned data retention.  Each day early in the morning I delete data from this table that is older than 18 months. I now see that I should probably vacuum the table after deleting the data.  That way any new information
 that is added is all grouped together. 

Your comments were very insightful.

Thanks,

Lance

From: Keith Fiske <keith.fiske@xxxxxxxxxxxxxxx>

Date: Tuesday, October 9, 2018 at 9:35 AM

To: Lance Campbell <lance@xxxxxxxxxxxx>

Cc: "pgsql-admin@xxxxxxxxxxxxxx" <pgsql-admin@xxxxxxxxxxxxxx>

Subject: Re: Should I use table partitions

On Tue, Oct 9, 2018 at 10:19 AM Campbell, Lance <lance@xxxxxxxxxxxx> wrote:

PostgreSQL 10.x

Question:
Should I use table partitions if my table meets the below parameters?  The size of the table should remain relatively consistent in size and number of
 rows.  I would not expect it to vary more than 15% at any given time.

Info:
Table rows: 83,051,164
Table size only: 25.6 G
Total size of table including indexes: 38.6 G
Table definition:

CREATE TABLE xyz.abc

(

    fk_def_id integer NOT NULL,

    id integer NOT NULL DEFAULT nextval(('xyz.abc_id_seq'::text)::regclass),

    email_address text,

    created_timestamp timestamp with time zone DEFAULT now(),

    updated_timestamp timestamp with time zone DEFAULT now(),

    provider text,

    server_id integer DEFAULT 0,

    read_count integer DEFAULT 0,

    fk_group_id integer DEFAULT 0,

    mail_log_timestamp timestamp with time zone DEFAULT now(),

    mail_log_process_code integer DEFAULT '-1'::integer,

    mail_log_message text DEFAULT,

    mail_log_mail_id text DEFAULT '',

    mail_log_rule text DEFAULT '',

    is_listserv boolean DEFAULT false,

    is_bad_email_address boolean DEFAULT false,

    bad_email_address_message text DEFAULT '',

    finder text DEFAULT '',

    constituent_id text DEFAULT '',

    CONSTRAINT abc_pkey PRIMARY KEY (id)

);

Thanks,

Lance Campbell

The big, main reason for partitioning in PostgreSQL is if you have data retention that has to be managed. It makes having to deal with bloat a lot easier since you don't have to do bulk deletes that may never be filled with new data again.
 If you have no data retention being done now, I wouldn't worry about it until you start getting closer to 1 billion rows or more. That's when vacuuming run times may start to be a concern and partitioning can help reduce those since the individual objects
 are smaller to manage. Also, if there's a lot of static data, and you're on 9.6+, you can greatly improve vacuum times by running a VACUUM FREEZE on the table. That added a feature to allow vacuum to skip pages that are fully frozen.

Query read performance is really a secondary concern for partitioning in PG, especially if you're not managing data retention. Btree lookups are FAR more efficient than constraint exclusion (partition pruning in PG11+) will ever be. And
 if you're only doing more narrow data lookups, you'll actually see a performance decrease in queries since removing partitions in the planner adds overhead. That overhead isn't as big of a concern if there's data retention to manage, though, since bloat will
 be a bigger impact than the planner overhead. If you're grabbing larger blocks of data at one time, that's when partitioning may start helping with reads more since it has to pull fewer pages from disk.

-- 

Keith Fiske

Senior Database Engineer

Crunchy Data - http://crunchydata.com