Search Postgresql Archives

Re: FW: Constraint exclusion in partitions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank for your patience :-)

- About using PgAdmin, anecdotal problems or not, I did the whole tests again in plain postgresql.

- About running queries once or not, Bill and Francisco both pointed out somehow that I should run each query multiple times to get appropriate statistics. I did it for all queries - First trial always longer, all other stabilise around the same values. The EXPLAIN ANALYSE for first and second trial on each query I ran on original table and on the partitioned one can be found below.

However, in my case, I will have to run most of my queries only once since I simply need to extract sample data for a research topic - there is no insert/update in the DB (that is why I thought looking at first trial was more appropriate). 

- About adding the exclusion check constraint, thank for remembering me such a simple thing that could have caused all this!-) but sadly, it was set to "partition", as expected. However, I have decided to run all the queries after having set the parameter to ON and restarted the database, just in case.

Even after doing all this, I did not find any improvement in execution times between my original fat table and the partitioned version (sometime even worst). If partitioning the table has improved significantly queries running times, I could have partitioned the tables differently to accommodate other query types I will have to run later in my research (I have the same problem for half a dozen tables). 

Since it does not seem that partitioning will do the job, I will get back to the original table to run my queries...

However, just in case someone knows a magical trick that can improve significantly the speed of my queries (but haven't told me yet!-) here are the details about the concerned table/indexes 

(Using https://wiki.postgresql.org/wiki/Index_Maintenance query ...)
Number of records: 3870130000
Table size: 369GB
Indexes size: 425GB
 - nodes_idversion_pk: 125GB 
 - nodes_changesetid_idx: 86GB
 - nodes_geom_idx: 241GB

Each record has 3 bigint, 2 boolean, 1 timestamp and 1 geography type.

I am running all this on my personal PC:  Windows 64b, i7 chip, 16GB ram.
I am using PostgreSQL 9.3 and the database cluster is spread over 2X3TB external drives with write caching.

Best regards,
Daniel


Results/explain/analyse follow...

--Constant ids-------------------------------------------------------------------------------------------------------------------------------------------------
--Explain analyse on original table for a query that will look into different partitions on the new table
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(10005000,1000005000,2000005000,3000005000);
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 rows=6726 width=66) (actual time=52.226..288.700 rows=6 loops=1)
   Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
 Total runtime: 288.732 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(10005000,1000005000,2000005000,3000005000);
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 rows=6726 width=66) (actual time=0.014..0.035 rows=6 loops=1)
   Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
 Total runtime: 0.056 ms

-- Explain analyse on partitioned table for a query that will look into different partitions
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id IN(10005000,1000005000,2000005000,3000005000); 
 Append  (cost=0.00..933.40 rows=223 width=66) (actual time=108.903..287.068 rows=6 loops=1)
   ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1)
         Filter: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes01_idversion_pk on nodes_01  (cost=0.57..622.78 rows=156 width=66) (actual time=108.900..108.916 rows=1 loops=1)
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes38_idversion_pk on nodes_38  (cost=0.57..138.25 rows=31 width=66) (actual time=89.523..89.543 rows=1 loops=1)
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes63_idversion_pk on nodes_63  (cost=0.57..119.01 rows=26 width=66) (actual time=49.978..49.998 rows=3 loops=1)
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes85_idversion_pk on nodes_85  (cost=0.57..53.37 rows=9 width=66) (actual time=38.600..38.603 rows=1 loops=1)
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
 Total runtime: 287.144 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id IN(10005000,1000005000,2000005000,3000005000); 
 Append  (cost=0.00..933.40 rows=223 width=66) (actual time=0.012..0.065 rows=6 loops=1)
   ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1)
         Filter: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes01_idversion_pk on nodes_01  (cost=0.57..622.78 rows=156 width=66) (actual time=0.010..0.017 rows=1 loops=1)
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes38_idversion_pk on nodes_38  (cost=0.57..138.25 rows=31 width=66) (actual time=0.010..0.015 rows=1 loops=1)
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes63_idversion_pk on nodes_63  (cost=0.57..119.01 rows=26 width=66) (actual time=0.012..0.016 rows=3 loops=1)
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
   ->  Index Scan using nodes85_idversion_pk on nodes_85  (cost=0.57..53.37 rows=9 width=66) (actual time=0.013..0.013 rows=1 loops=1)
         Index Cond: (id = ANY ('{10005000,1000005000,2000005000,3000005000}'::bigint[]))
 Total runtime: 0.125 ms

--Explain analyse on original table for a query that will look into one partition on the new table
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(723005000,733005000,743005000,753005000);
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 rows=6726 width=66) (actual time=37.366..158.445 rows=4 loops=1)
   Index Cond: (id = ANY ('{723005000,733005000,743005000,753005000}'::bigint[]))
 Total runtime: 158.479 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(723005000,733005000,743005000,753005000);
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 rows=6726 width=66) (actual time=0.014..0.032 rows=4 loops=1)
   Index Cond: (id = ANY ('{723005000,733005000,743005000,753005000}'::bigint[]))
 Total runtime: 0.054 ms

--Explain analyse on partitioned table for a query that will look into one partition
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id IN(723005000,733005000,743005000,753005000); 
 Append  (cost=0.00..196.84 rows=47 width=66) (actual time=163.898..441.497 rows=4 loops=1)
   ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1)
         Filter: (id = ANY ('{723005000,733005000,743005000,753005000}'::bigint[]))
   ->  Index Scan using nodes31_idversion_pk on nodes_31  (cost=0.57..196.84 rows=46 width=66) (actual time=163.894..441.491 rows=4 loops=1)
         Index Cond: (id = ANY ('{723005000,733005000,743005000,753005000}'::bigint[]))
 Total runtime: 441.549 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id IN(723005000,733005000,743005000,753005000); 
 Append  (cost=0.00..196.84 rows=47 width=66) (actual time=0.011..0.027 rows=4 loops=1)
   ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1)
         Filter: (id = ANY ('{723005000,733005000,743005000,753005000}'::bigint[]))
   ->  Index Scan using nodes31_idversion_pk on nodes_31  (cost=0.57..196.84 rows=46 width=66) (actual time=0.009..0.025 rows=4 loops=1)
         Index Cond: (id = ANY ('{723005000,733005000,743005000,753005000}'::bigint[]))
 Total runtime: 0.062 ms


--Range ids-------------------------------------------------------------------------------------------------------------------------------------------------
--Explain analyse on original table for a query that will look into different partitions on the new table
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id BETWEEN 1522999949 AND 1523000049; 
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.70..383.51 rows=144 width=66) (actual time=73.115..180.769 rows=53 loops=1)
   Index Cond: ((id >= 1522999949) AND (id <= 1523000049))
 Total runtime: 180.820 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id BETWEEN 1522999949 AND 1523000049; 
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.70..383.51 rows=144 width=66) (actual time=0.020..0.039 rows=53 loops=1)
   Index Cond: ((id >= 1522999949) AND (id <= 1523000049))
 Total runtime: 0.060 ms

--Explain analyse on partitioned table for a query that will look into different partitions
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id BETWEEN 1522999949 AND 1523000049; 
 Append  (cost=0.00..408.16 rows=104 width=66) (actual time=0.014..46.196 rows=53 loops=1)
   ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1)
         Filter: ((id >= 1522999949) AND (id <= 1523000049))
   ->  Index Scan using nodes51_idversion_pk on nodes_51  (cost=0.56..183.52 rows=46 width=66) (actual time=0.012..20.216 rows=18 loops=1)
         Index Cond: ((id >= 1522999949) AND (id <= 1523000049))
   ->  Index Scan using nodes52_idversion_pk on nodes_52  (cost=0.56..224.64 rows=57 width=66) (actual time=0.022..25.973 rows=35 loops=1)
         Index Cond: ((id >= 1522999949) AND (id <= 1523000049))
 Total runtime: 46.254 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id BETWEEN 1522999949 AND 1523000049; 
 Append  (cost=0.00..408.16 rows=104 width=66) (actual time=0.010..0.038 rows=53 loops=1)
   ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1)
         Filter: ((id >= 1522999949) AND (id <= 1523000049))
   ->  Index Scan using nodes51_idversion_pk on nodes_51  (cost=0.56..183.52 rows=46 width=66) (actual time=0.008..0.015 rows=18 loops=1)
         Index Cond: ((id >= 1522999949) AND (id <= 1523000049))
   ->  Index Scan using nodes52_idversion_pk on nodes_52  (cost=0.56..224.64 rows=57 width=66) (actual time=0.006..0.017 rows=35 loops=1)
         Index Cond: ((id >= 1522999949) AND (id <= 1523000049))
 Total runtime: 0.081 ms


--Select ids-------------------------------------------------------------------------------------------------------------------------------------------------
--Explain analyse on original table for a query that will look into one partition on the new table but list of ids provided through a select statement
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(SELECT * FROM subset); 
 Nested Loop  (cost=99.93..884823.94 rows=2028512050 width=66) (actual time=97.489..2289.772 rows=5979 loops=1)
   ->  HashAggregate  (cost=99.22..101.22 rows=200 width=8) (actual time=2.155..3.649 rows=5941 loops=1)
         ->  Seq Scan on subset  (cost=0.00..84.78 rows=5778 width=8) (actual time=0.018..0.581 rows=5978 loops=1)
   ->  Index Scan using nodes_idversion_pk on old_nodes  (cost=0.70..4406.68 rows=1693 width=66) (actual time=0.384..0.384 rows=1 loops=5941)
         Index Cond: (id = subset.id)
 Total runtime: 2290.122 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(SELECT * FROM subset); 
 Nested Loop  (cost=99.93..884823.94 rows=2028512050 width=66) (actual time=1.785..25.730 rows=5979 loops=1)
   ->  HashAggregate  (cost=99.22..101.22 rows=200 width=8) (actual time=1.767..2.661 rows=5941 loops=1)
         ->  Seq Scan on subset  (cost=0.00..84.78 rows=5778 width=8) (actual time=0.009..0.373 rows=5978 loops=1)
   ->  Index Scan using nodes_idversion_pk on old_nodes  (cost=0.70..4406.68 rows=1693 width=66) (actual time=0.003..0.003 rows=1 loops=5941)
         Index Cond: (id = subset.id)
 Total runtime: 26.005 ms

--Explain analyse on partitioned table for a query that will look into one partition but list of ids provided through a select statement
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id IN(SELECT * FROM subset); 
 Nested Loop  (cost=99.22..1403193.39 rows=1935067087 width=66) (actual time=12146.666..19140.901 rows=5979 loops=1)
   ->  HashAggregate  (cost=99.22..101.22 rows=200 width=8) (actual time=1.998..4.496 rows=5941 loops=1)
         ->  Seq Scan on subset  (cost=0.00..84.78 rows=5778 width=8) (actual time=0.010..0.390 rows=5978 loops=1)
   ->  Append  (cost=0.00..6997.97 rows=1749 width=66) (actual time=2.925..3.214 rows=1 loops=5941)
         ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.000..0.000 rows=0 loops=5941)
               Filter: (subset.id = id)
         ->  Index Scan using nodes01_idversion_pk on nodes_01  (cost=0.56..151.70 rows=39 width=66) (actual time=0.013..0.013 rows=0 loops=5941)
               Index Cond: (id = subset.id)
         ->  Index Scan using nodes02_idversion_pk on nodes_02  (cost=0.56..219.02 rows=57 width=66) (actual time=0.012..0.012 rows=0 loops=5941)
               Index Cond: (id = subset.id)
-- skipped for nodes_03 to nodes_84
         ->  Index Scan using nodes85_idversion_pk on nodes_85  (cost=0.56..12.33 rows=2 width=66) (actual time=0.036..0.036 rows=0 loops=5941)
               Index Cond: (id = subset.id)
         ->  Index Scan using nodes86_idversion_pk on nodes_86  (cost=0.56..12.33 rows=2 width=66) (actual time=0.688..0.704 rows=1 loops=5941)
               Index Cond: (id = subset.id)
 Total runtime: 19142.983 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id IN(SELECT * FROM subset); 
 Nested Loop  (cost=99.22..1403193.39 rows=1935067087 width=66) (actual time=2.282..1382.156 rows=5979 loops=1)
   ->  HashAggregate  (cost=99.22..101.22 rows=200 width=8) (actual time=1.834..4.327 rows=5941 loops=1)
         ->  Seq Scan on subset  (cost=0.00..84.78 rows=5778 width=8) (actual time=0.009..0.376 rows=5978 loops=1)
   ->  Append  (cost=0.00..6997.97 rows=1749 width=66) (actual time=0.225..0.226 rows=1 loops=5941)
         ->  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual time=0.000..0.000 rows=0 loops=5941)
               Filter: (subset.id = id)
         ->  Index Scan using nodes01_idversion_pk on nodes_01  (cost=0.56..151.70 rows=39 width=66) (actual time=0.003..0.003 rows=0 loops=5941
               Index Cond: (id = subset.id)
         ->  Index Scan using nodes02_idversion_pk on nodes_02  (cost=0.56..219.02 rows=57 width=66) (actual time=0.002..0.002 rows=0 loops=5941
               Index Cond: (id = subset.id)
-- skipped for nodes_03 to nodes_84
         ->  Index Scan using nodes85_idversion_pk on nodes_85  (cost=0.56..12.33 rows=2 width=66) (actual time=0.002..0.002 rows=0 loops=5941)
               Index Cond: (id = subset.id)
         ->  Index Scan using nodes86_idversion_pk on nodes_86  (cost=0.56..12.33 rows=2 width=66) (actual time=0.004..0.004 rows=1 loops=5941)
               Index Cond: (id = subset.id)
 Total runtime: 1383.929 ms



-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux