Re: FW: Constraint exclusion in partitions

Daniel Begin <jfd553@xxxxxxxxxxx> · Sat, 23 May 2015 18:16:43 -0400

Hello Bill, 
You wrote that my testing methodology is flawed - I hope you are right!  

However, I am a bit confused about your comments. Yes, I did edited the name
of the tables for clarity but if I miss the point I, I will do it again as I
am writing without modifying anything. Here is the procedure I follow and
results...

I use pgadmin_III sql window. I write the following query (I have changed
the id to make sure it does not use previous results still in memory)...

Select * from nodes where id=345678912; -- nodes is the real partitioned
table name

Now I select "explain query" from the menu and I get the following result...
"Append  (cost=0.00..384.08 rows=99 width=66)"
"  ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66)"
"        Filter: (id = 345678912)"
"  ->  Index Scan using nodes19_idversion_pk on nodes_19  (cost=0.56..384.08
rows=98 width=66)"
"        Index Cond: (id = 345678912)"

Now, I select "run" and I get one record as a result and the following
message in history tab...
-- Executing query:
Select * from nodes where id=345678912; 
Total query runtime: 62 ms.
1 row retrieved.

Now, if I use the same query on the original table using the same procedure,
here is what I get...
Select * from old_nodes where id=345678912; -- old_nodes is the real
original table name

Explain gives me the following
"Index Scan using nodes_idversion_pk on old_nodes  (cost=0.70..4437.15
rows=1682 width=66)"
"  Index Cond: (id = 345678912)"

Running the query gives me the same record with the following message in
history tab...
-- Executing query:
select * from old_nodes where id=345678912; 
Total query runtime: 62 ms.
1 row retrieved.

This time, the history tab shows that both took the same time to run (an
improvement!?)
Let's try this one using the same procedure...

Select * from old_nodes where id IN
(10050000,1000050000,2000050000,3000050000)
"Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18
rows=6726 width=66)"
"  Index Cond: (id = ANY
('{10050000,1000050000,2000050000,3000050000}'::bigint[]))"
-- Executing query:
Select * from old_nodes where id IN
(10050000,1000050000,2000050000,3000050000)
Total query runtime: 171 ms.
5 rows retrieved.

Select * from nodes where id IN (10050000,1000050000,2000050000,3000050000)
"Append  (cost=0.00..933.40 rows=223 width=66)"
"  ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66)"
"        Filter: (id = ANY
('{10050000,1000050000,2000050000,3000050000}'::bigint[]))"
"  ->  Index Scan using nodes01_idversion_pk on nodes_01  (cost=0.57..622.78
rows=156 width=66)"
"        Index Cond: (id = ANY
('{10050000,1000050000,2000050000,3000050000}'::bigint[]))"
"  ->  Index Scan using nodes38_idversion_pk on nodes_38  (cost=0.57..138.25
rows=31 width=66)"
"        Index Cond: (id = ANY
('{10050000,1000050000,2000050000,3000050000}'::bigint[]))"
"  ->  Index Scan using nodes63_idversion_pk on nodes_63  (cost=0.57..119.01
rows=26 width=66)"
"        Index Cond: (id = ANY
('{10050000,1000050000,2000050000,3000050000}'::bigint[]))"
"  ->  Index Scan using nodes85_idversion_pk on nodes_85  (cost=0.57..53.37
rows=9 width=66)"
"        Index Cond: (id = ANY
('{10050000,1000050000,2000050000,3000050000}'::bigint[]))"
-- Executing query:
Select * from nodes where id IN (10050000,1000050000,2000050000,3000050000)
Total query runtime: 140 ms.
5 rows retrieved.

This time the history tab shows that the query was shorter to run on
partitioned table (a real improvement!?)

I know, this is different from what I referred to in my original email (at
least both shows similar running time) but I swear, I did not change the
times when editing table names!-)

Do you see any glitch/flaw in the procedure I am using?
Someone has an idea about the problem behind this unexpected behavior? 

I really need to get much faster results with my queries on this large table
and partitioning was my last option...

Best regards,
Daniel

-----Original Message-----
From: pgsql-general-owner@xxxxxxxxxxxxxx
[mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of Bill Moran
Sent: May-23-15 15:23
To: Daniel Begin
Cc: pgsql-general@xxxxxxxxxxxxxx
Subject: Re:  FW: Constraint exclusion in partitions

A large portion of why you describe below is the exact opposite of my own
testing (unfortunately, I don't have the actual test results any more
because I did the tests for a former employer).

In my tests, single lookups against the same column being used to partition
improved performance in direct proportion to the number of partitions. I.e.
if the tables are partitioned on id, and the lookup is for id, and the table
has 10 partitions, the query is 10x faster on the partitioned version than
the non-partitioned verison.

Queries against indexes not partitioned were slightly slower in my tests,
but nowhere near the degree that you're showing below.

I can't help but think that your testing methodology is flawed, but since
you're not showing us what you actually did, it's difficult to be sure. See
below for some specifics on what I'm concerned that you might be doing wrong
...

On Sat, 23 May 2015 14:37:25 -0400
Daniel Begin <jfd553@xxxxxxxxxxx> wrote:

> Following Francisco suggestion, I was able to do some tests earlier this
morning when the partitioning process completed and all the resulting tables
analyzed.
> 
> Here is what I got on both the original table and its partitioned
counterpart while running the same queries. I tested them only for a couple
of values but in summary...
> 
> Using a constant id: 
> All the queries I tried took longer on the partitioned table! I got 
> similar results for multiple records using IN (id value1, id value2 
> ...)
> 
> Using a range of ids: 
> Surprisingly again, all the queries I tried took longer on the partitioned
table!
> 
> Using a list of ids from a select clause: 
> More surprisingly, the queries I tried took less time on the partitioned
table at least when using the primary key. Using an indexed field took so
long compared to the old table that I cancelled the execution for the new
one!
> 
> Guess what, I will get back to my old fat table unless someone tells me I
missed something obvious!
> Daniel
> 
> Note: Tables/indexes description, queries and execution plans are below.
> 
> 
> 
> 
> 
> 
> Tables/indexes description 
> ----------------------------------------------------------------------
> ------------ The original table has 3870130000 records. Primary 
> key/index on each partition queries are ALTER TABLE oldtable ADD 
> CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, version); CREATE 
> INDEX oldtable_groupid_idx ON oldtable USING btree (group_id);
> 
> The partitioned table has 3870130000 records distributed over 87 
> partitions. Primary key/index on each partition queries are ALTER 
> TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY 
> (id, version); CREATE INDEX newtablexx_groupid_idx ON newtable_xx 
> USING btree (group_id); Where xx is the partition's number suffix
> 
> constant id 
> ----------------------------------------------------------------------
> --------------------------------- select * from oldtable where 
> id=123456789; "Index Scan using oldtable_idversion_pk on oldtable  
> (cost=0.70..4437.15 rows=1682 width=66)"
> "  Index Cond: (id = 123456789::bigint)"
> --Total query runtime: 62 ms. 1 rows retrieved
> 
> 
> select * from newtable where id=123456789; "Append  (cost=0.00..20.19 
> rows=5 width=66)"
> "  ->  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)"
> "        Filter: (id = 123456789::bigint)"
> "  ->  Index Scan using newtable72_idversion_pk on newtable_72
(cost=0.56..20.19 rows=4 width=66)"
> "        Index Cond: (id = 123456789::bigint)"
> --Total query runtime: 156 ms. 1 rows retrieved

Notice in these results that you're not showing the command that was
executed.
The output is mostly likely from the command "ANALYZE select * from newtable
where id=123456789;" but that's not the command you claim that you ran. In
any event, the analyze output doesn't line up with the times you claim: i.e.
Analyze is showing that the first query should take about 4437 time units to
complete, and the second one should take about 20 time units, yet you claim
the second one is slower. The other queries below exhibit a similar pattern.

Are you sure you're not timing ANALYZE itself instead of the query? Because
timing "ANALYZE select * from newtable where id=123456789;" is not going to
be timing the actual time the query took to run. I would certainly expect
the _planning_ of a query against partitioned tables to take longer than
non- partitioned, but I would also expect the execution time to be the
opposite.
Hence my theory that you've accidentally timed the ANALYZE instead of the
actual running of the query. Naturally, the total query time is planning +
execution, and my experience shows that the loss in planning speed is more
than made up for by the gain in execution speed.

Perhaps you should show us the exact output of one of your tests, without
editorializing.

> I got similar results for multiple records...
> select * from oldtable where id 
> IN(10000000,1000000000,2000000000,3000000000);
> "Index Scan using oldtable_idversion_pk on oldtable  (cost=0.71..17739.18
rows=6726 width=66)"
> "  Index Cond: (id = ANY
('{10000000,1000000000,2000000000,3000000000}'::bigint[]))"
> --Total query runtime: 187 ms. 4 rows retrieved
> 
> select * from newtable where id 
> IN(10000000,1000000000,2000000000,3000000000);
> "Append  (cost=0.00..933.40 rows=223 width=66)"
> "  ->  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)"
> "        Filter: (id = ANY
('{10000000,1000000000,2000000000,3000000000}'::bigint[]))"
> "  ->  Index Scan using newtable01_idversion_pk on newtable_01
(cost=0.57..622.78 rows=156 width=66)"
> "        Index Cond: (id = ANY
('{10000000,1000000000,2000000000,3000000000}'::bigint[]))"
> ...
> "  ->  Index Scan using newtable85_idversion_pk on newtable_85
(cost=0.57..53.37 rows=9 width=66)"
> "        Index Cond: (id = ANY
('{10000000,1000000000,2000000000,3000000000}'::bigint[]))"
> --Total query runtime: 421 ms. 4 rows retrieved
> 
> 
> range of ids 
> ----------------------------------------------------------------------
> --------------------------------- select * from oldtable where id 
> between 1522999949 and 1523000049; "Index Scan using 
> oldtable_idversion_pk on oldtable  (cost=0.70..383.51 rows=144 width=66)"
> "  Index Cond: ((id >= 1522999949) AND (id <= 1523000049))"
> Total query runtime: 47 ms. 53 rows retrieved.
> 
> select * from newtable where id between 1522999949 and 1523000049; 
> "Append  (cost=0.00..408.16 rows=104 width=66)"
> "  ->  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)"
> "        Filter: ((id >= 1522999949) AND (id <= 1523000049))"
> "  ->  Index Scan using newtable51_idversion_pk on newtable_51
(cost=0.56..183.52 rows=46 width=66)"
> "        Index Cond: ((id >= 1522999949) AND (id <= 1523000049))"
> "  ->  Index Scan using newtable52_idversion_pk on newtable_52
(cost=0.56..224.64 rows=57 width=66)"
> "        Index Cond: ((id >= 1522999949) AND (id <= 1523000049))"
> Total query runtime: 78 ms. 53 rows retrieved.
> 
> 
> list of ids from a select clause 
> ----------------------------------------------------------------------
> --------------------------------- --Subset provides 4 ids similar but 
> not identical to the previous query select * from oldtable where id IN 
> (select * from subset); "Nested Loop  (cost=37.45..886298.00 
> rows=2028512050 width=66)"
> "  ->  HashAggregate  (cost=36.75..38.75 rows=200 width=8)"
> "        ->  Seq Scan on subset  (cost=0.00..31.40 rows=2140 width=8)"
> "  ->  Index Scan using oldtable_idversion_pk on oldtable
(cost=0.70..4414.37 rows=1693 width=66)"
> "        Index Cond: (id = subset.id)"
> Total query runtime: 171 ms. 4 rows retrieved.
> 
> select * from newtable where id IN (select * from subset) "Nested Loop  
> (cost=36.75..1407672.76 rows=1935067087 width=66)"
> "  ->  HashAggregate  (cost=36.75..38.75 rows=200 width=8)"
> "        ->  Seq Scan on subset  (cost=0.00..31.40 rows=2140 width=8)"
> "  ->  Append  (cost=0.00..7020.68 rows=1749 width=66)"
> "        ->  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)"
> "              Filter: (subset.id = id)"
> "        ->  Index Scan using newtable01_idversion_pk on newtable_01
(cost=0.56..151.97 rows=39 width=66)"
> "              Index Cond: (id = subset.id)"
> ...
> "        ->  Index Scan using newtable86_idversion_pk on newtable_86
(cost=0.56..12.42 rows=2 width=66)"
> "              Index Cond: (id = subset.id)"
> Total query runtime: 140 ms. 4 rows retrieved.
> 
> 
> Using an index, not the primary key 
> ----------------------------------------------------------------------
> -------- --Subset provides 58 group_id pointing to 5978 records in the 
> concerned tables select * from oldtable where group_id IN (select * 
> from subset) "Nested Loop  (cost=37.33..21575715.89 rows=2028512050 
> width=66)"
> "  ->  HashAggregate  (cost=36.75..38.75 rows=200 width=8)"
> "        ->  Seq Scan on subset  (cost=0.00..31.40 rows=2140 width=8)"
> "  ->  Index Scan using oldtable_groupid_idx on oldtable
(cost=0.58..107364.99 rows=51340 width=66)"
> "        Index Cond: (group_id = subset.id)"
> Total query runtime: 3986 ms. 5978 rows retrieved.
> 
> 
> select * from newtable where group_id IN (select * from subset) "Hash 
> Join  (cost=41.25..138092255.85 rows=1935067087 width=66)"
> "  Hash Cond: (newtable.group_id = subset.id)"
> "  ->  Append  (cost=0.00..84877869.72 rows=3870134173 width=66)"
> "        ->  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)"
> "        ->  Seq Scan on newtable_01  (cost=0.00..946235.96 rows=46526896
width=66)"
> ...
> "        ->  Seq Scan on newtable_86  (cost=0.00..986527.64 rows=44269664
width=66)"
> "  ->  Hash  (cost=38.75..38.75 rows=200 width=8)"
> "        ->  HashAggregate  (cost=36.75..38.75 rows=200 width=8)"
> "              ->  Seq Scan on subset  (cost=0.00..31.40 rows=2140
width=8)"
> Execution Cancelled after 766702 ms !
> 
> I tried the same with "SET enable_seqscan = OFF" and got an index scan 
> of all tables;
> 
> 
> 
> --
> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To 
> make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

--
Bill Moran

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general