Re: Join the master table with other table is very slow (partitioning)

AI Rumman <rummandba@xxxxxxxxx> · Fri, 15 Mar 2013 11:12:26 -0400

On Fri, Mar 15, 2013 at 11:09 AM, Ao Jianwang <aojw2008@xxxxxxxxx> wrote:

Hi Rumman,

Thanks for your response. I follow the guide to build the partition. The settings should be good. See the following result. Any insight? thanks.

dailyest=# select version();
                                                  version                                                   

------------------------------------------------------------------------------------------------------------

 PostgreSQL 9.2.3 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

(1 row)

dailyest=# show constraint_exclusion;

 constraint_exclusion 

----------------------
 on
(1 row)

On Fri, Mar 15, 2013 at 11:04 PM, AI Rumman <rummandba@xxxxxxxxx> wrote:

Which version of Postgresql are you using?Have you set constraint_exclusion to parition?

On Fri, Mar 15, 2013 at 11:02 AM, Ao Jianwang <aojw2008@xxxxxxxxx> wrote:

Hi Experts,

I found if we join the master table with other small table, then the running time is slow. While, if we join each child table with the small table, then it's very fast. Any comments and suggestions are greatly appreciated.

For example, par_list table is small(about 50k rows), while par_est is very large, for each day it's about 400MB. Therefore, we partition it by day. However, the query plan for joining the master table with par_list is bad, so the running time is slow. The good plan should be join each partition table with par_list separately, then aggregate the result together. 

1. Join the master table with a small table. It's slow.
dailyest=# explain (analyze on, buffers on)        

dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 

dailyest-#                 par_est e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and

dailyest-#                 e.date BETWEEN '2012-07-08' and '2012-07-10' and 
dailyest-#                 l.fid = 1 and

dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667

dailyest-#             GROUP BY e.date
dailyest-#             ORDER BY e.date;

-----------------------
 GroupAggregate  (cost=745326.86..745326.88 rows=1 width=8) (actual time=6281.364..6281.366 rows=3 loops=1)

   Buffers: shared hit=3 read=175869
   ->  Sort  (cost=745326.86..745326.86 rows=1 width=8) (actual time=6281.358..6281.358 rows=6 loops=1)

         Sort Key: e.date
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=3 read=175869

         ->  Nested Loop  (cost=0.00..745326.85 rows=1 width=8) (actual time=1228.493..6281.349 rows=6 loops=1)
               Join Filter: (l.id = e.list_id)

               Rows Removed by Join Filter: 4040
               Buffers: shared hit=3 read=175869
               ->  Seq Scan on par_list l  (cost=0.00..1213.10 rows=2 width=4) (actual time=0.010..38.272 rows=2 loops=1)

                     Filter: ((fid = 1) AND (sid = 143441) AND (cid = ANY ('{36,39,6000}'::integer[])))
                     Rows Removed by Filter: 50190

                     Buffers: shared hit=3 read=269
               ->  Materialize  (cost=0.00..744102.56 rows=407 width=12) (actual time=9.707..3121.053 rows=2023 loops=2)

                     Buffers: shared read=175600
                     ->  Append  (cost=0.00..744100.52 rows=407 width=12) (actual time=19.410..6240.044 rows=2023 loops=1)

                           Buffers: shared read=175600
                           ->  Seq Scan on par_est e  (cost=0.00..0.00 rows=1 width=12) (actual time=0.001..0.001 rows=0 loops=1)

                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                           ->  Seq Scan on par_est_2012_07 e  (cost=0.00..0.00 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=1)

                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                           ->  Seq Scan on par_est_2012_07_08 e  (cost=0.00..247736.09 rows=135 width=12) (actual time=19.408..2088.627 rows=674 loops=1)

                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10814878

                                 Buffers: shared read=58463
                           ->  Seq Scan on par_est_2012_07_09 e  (cost=0.00..248008.81 rows=137 width=12) (actual time=6.390..1963.238 rows=676 loops=1)

                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10826866

                                 Buffers: shared read=58528
                           ->  Seq Scan on par_est_2012_07_10 e  (cost=0.00..248355.62 rows=133 width=12) (actual time=15.135..2187.312 rows=673 loops=1)

                                 Filter: ((date >= '2012-07-08'::date) AND (date <= '2012-07-10'::date) AND (aid = 333710667))
                                 Rows Removed by Filter: 10841989

                                 Buffers: shared read=58609
 Total runtime: 6281.444 ms
(35 rows)

2. Join each partition table with small table (par_list) and union the result. This runs very fast. However, it's not reasonable if we union 180 SELECT statements (for example, the date is from 2012-07-01 to 2012-12-31. Any better suggestions.

dailyest=# explain (analyze on, buffers on)
dailyest-#  SELECT e.date, max(e.estimate)

dailyest-#             FROM  
dailyest-#                 par_list l, 
dailyest-#                 par_est_2012_07_08 e

dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and

dailyest-#                 e.date = '2012-07-08' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and

dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-# UNION ALL

dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  
dailyest-#                 par_list l, 

dailyest-#                 par_est_2012_07_09 e
dailyest-#             WHERE 
dailyest-#                 l.id = e.list_id and

dailyest-#                 e.date = '2012-07-09' and 
dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and

dailyest-#                 l.cid in (36, 39, 6000)  and
dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date

dailyest-# UNION ALL
dailyest-#  SELECT e.date, max(e.estimate)
dailyest-#             FROM  

dailyest-#                 par_list l, 
dailyest-#                 par_est_2012_07_10 e
dailyest-#             WHERE 

dailyest-#                 l.id = e.list_id and
dailyest-#                 e.date = '2012-07-10' and 

dailyest-#                 l.fid = 1 and
dailyest-#                 l.sid = 143441 and
dailyest-#                 l.cid in (36, 39, 6000)  and

dailyest-#                 e.aid = 333710667
dailyest-#             GROUP BY e.date
dailyest-# ;

                                                                                             QUERY PLAN                                       

----------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------
 Result  (cost=0.00..91.49 rows=3 width=8) (actual time=83.736..254.912 rows=3 loops=1)

   Buffers: shared hit=27 read=28
   ->  Append  (cost=0.00..91.49 rows=3 width=8) (actual time=83.735..254.910 rows=3 loops=1)

         Buffers: shared hit=27 read=28
         ->  GroupAggregate  (cost=0.00..30.48 rows=1 width=8) (actual time=83.735..83.735 rows=1 loops=1)

               Buffers: shared hit=9 read=12
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=63.920..83.728 rows=2 loops=1)

                     Buffers: shared hit=9 read=12
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=1.540..1.550 rows=2 loops=1)

                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7 read=4

                     ->  Index Only Scan using par_est_2012_07_08_pkey on par_est_2012_07_08 e  (cost=0.00..5.94 rows=1 width=12) (actual time=41.083..41.083 rows=1 loops=2)

                           Index Cond: ((date = '2012-07-08'::date) AND (list_id = l.id) AND (aid = 333710667))

                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
         ->  GroupAggregate  (cost=0.00..30.48 rows=1 width=8) (actual time=76.911..76.911 rows=1 loops=1)

               Buffers: shared hit=9 read=8
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=57.580..76.909 rows=2 loops=1)

                     Buffers: shared hit=9 read=8
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.016 rows=2 loops=1)

                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7

                     ->  Index Only Scan using par_est_2012_07_09_pkey on par_est_2012_07_09 e  (cost=0.00..5.94 rows=1 width=12) (actual time=38.440..38.442 rows=1 loops=2)

                           Index Cond: ((date = '2012-07-09'::date) AND (list_id = l.id) AND (aid = 333710667))

                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
         ->  GroupAggregate  (cost=0.00..30.49 rows=1 width=8) (actual time=94.262..94.262 rows=1 loops=1)

               Buffers: shared hit=9 read=8
               ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8) (actual time=74.393..94.259 rows=2 loops=1)

                     Buffers: shared hit=9 read=8
                     ->  Index Scan using par_list_sid_fid_cid_key on par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.017 rows=2 loops=1)

                           Index Cond: ((sid = 143441) AND (fid = 1) AND (cid = ANY ('{36,39,6000}'::integer[])))
                           Buffers: shared hit=7

                     ->  Index Only Scan using par_est_2012_07_10_pkey on par_est_2012_07_10 e  (cost=0.00..5.95 rows=1 width=12) (actual time=47.116..47.117 rows=1 loops=2)

                           Index Cond: ((date = '2012-07-10'::date) AND (list_id = l.id) AND (aid = 333710667))

                           Heap Fetches: 0
                           Buffers: shared hit=2 read=8
 Total runtime: 255.074 ms
(38 rows)

At first. you may try the following out and find out if the partition constraint exclusion is working or not::
explain
select * 
FROM  
par_est e
WHERE 
e.date BETWEEN '2012-07-08' and '2012-07-10'