Re: FW: Constraint exclusion in partitions

Francisco Olarte <folarte@xxxxxxxxxxxxxx> · Sat, 23 May 2015 12:53:29 +0200

Hi Daniel

On Fri, May 22, 2015 at 7:21 PM, Daniel Begin <jfd553@xxxxxxxxxxx> wrote:
> I have split a large table (billions of records) into multiple partitions,
> hoping the access would be faster. I used an ID to make partitions check
> (check (id >= 100 AND id < 200)…) and created over 80 tables (children) that
> are now filled with data.
...
> I understand that the following query will use constraint exclusion and will
> run faster…
> a-      Select * from parent_table where id >=9999; -- using a constant
> But how constraint exclusion would react with the following queries …
> b-      Select * from parent_table where id between 2345 and 6789; -- using
> a range of ids
> c-       Select * from parent_table where id in(select ids from
> anothertable); -- using a list of ids from a select

Given you have already partitioned it, why don't you just use explain
[ analyze ] on the queries? I.e., in one of my tables, partitioned
monthly by a timestamp ( with time zone ) field I get ( even if I
never use between, a closed interval, for a continuous like type like
timestamp, the optimizer clearly shows it's transfroaming it to the
equivalent AND condition ):

explain select * from carrier_cdrs where setup between
'20150107T123456' and '20150322T222222';

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..82202.41 rows=2346599 width=74)
   ->  Seq Scan on carrier_cdrs  (cost=0.00..0.00 rows=1 width=184)
         Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
   ->  Seq Scan on carrier_cdrs_201501  (cost=0.00..30191.10
rows=816551 width=74)
         Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
   ->  Seq Scan on carrier_cdrs_201502  (cost=0.00..25277.45
rows=872830 width=74)
         Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
   ->  Seq Scan on carrier_cdrs_201503  (cost=0.00..26733.85
rows=657217 width=74)
         Filter: ((setup >= '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup <= '2015-03-22 22:22:22+01'::timestamp with time
zone))
(9 rows)

> Since I mostly use queries of type b and c, I am wondering if partitioning
> the large table was appropriate and if the queries are going to be longer to
> run…

The problem is gonna be all the extra conditions, so I'll check real
queries. My bet is in a query EXACTLY like b it will use constraint
exclusion, and can be potentially faster if your interval are for just
100 ids, but why speculate when YOU can measure?

Also think if you touch 67-23+1=45 partitions and the DB has to check
other indexed fields it is a lot of work. As I said, it depend on your
actual data, actual indexes, and actual queries, just measure it. In
my excample table I partition the data by the TS, which is the single
indexed field and my tests showed it was faster that way ( but my
queries are normally big scans of date ranges or more selective ones
with narrow TS conditions, and I measured them ).

Francisco Olarte.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general