Re: Performance issues

Vivekanand Joshi <vjoshi@xxxxxxxxxxxxxxxxxxx> · Tue, 17 Mar 2015 13:11:27 +0530

Hi Guys,

Next level of query is following:

If this works, I guess 90% of the problem will be solved.

SELECT
                    COUNT(DISTINCT TARGET_ID)
                FROM
                    S_V_F_PROMOTION_HISTORY_EMAIL PH
                    INNER JOIN S_V_D_CAMPAIGN_HIERARCHY CH
                        ON PH.TOUCHPOINT_EXECUTION_ID =
CH.TOUCHPOINT_EXECUTION_ID
                WHERE
                    1=1
                    AND SEND_DT >= '2014-03-13'
                    AND SEND_DT <= '2015-03-14'

In this query, I am joining two views which were made earlier with CTEs. I
have replaced the CTE's with subqueries. The view were giving me output in
around 5-10 minutes and now I am getting the same result in around 3-4
seconds.

But when I executed the query written above, I am again stuck. I am
attaching the query plan as well the link.

http://explain.depesz.com/s/REeu

I can see most of the time is spending inside a nested loop and total
costs comes out be cost=338203.81..338203.82.

How to take care of this? I need to run this query in a report so I cannot
create a table like select * from views and then join the table. If I do
that I am getting the answer of whole big query in some 6-7 seconds. But
that is not feasible. A report (Jasper can have only one single (big/small
query).

Let me know if you need any other information.

Thanks a ton!
Vivek

-----Original Message-----
From: Jim Nasby [mailto:Jim.Nasby@xxxxxxxxxxxxxx]
Sent: Tuesday, March 17, 2015 5:36 AM
To: Tomas Vondra; vjoshi@xxxxxxxxxxxxxxxxxxx; Scott Marlowe; Varadharajan
Mukundan
Cc: pgsql-performance@xxxxxxxxxxxxxx
Subject: Re:  Performance issues

On 3/16/15 3:59 PM, Tomas Vondra wrote:
> On 16.3.2015 20:43, Jim Nasby wrote:
>> On 3/13/15 7:12 PM, Tomas Vondra wrote:
>>> (4) I suspect many of the relations referenced in the views are not
>>>       actually needed in the query, i.e. the join is performed but
>>>       then it's just discarded because those columns are not used.
>>>       Try to simplify the views as much has possible - remove all the
>>>       tables that are not really necessary to run the query. If two
>>>       queries need different tables, maybe defining two views is
>>>       a better approach.
>>
>> A better alternative with multi-purpose views is to use an outer join
>> instead of an inner join. With an outer join if you ultimately don't
>> refer to any of the columns in a particular table Postgres will
>> remove the table from the query completely.
>
> Really? Because a quick test suggests otherwise:
>
> db=# create table test_a (id int);
> CREATE TABLE
> db=# create table test_b (id int);
> CREATE TABLE
> db=# explain select test_a.* from test_a left join test_b using (id);
>                                QUERY PLAN
> ----------------------------------------------------------------------
>   Merge Left Join  (cost=359.57..860.00 rows=32512 width=4)
>     Merge Cond: (test_a.id = test_b.id)
>     ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
>           Sort Key: test_a.id
>           ->  Seq Scan on test_a  (cost=0.00..35.50 rows=2550 width=4)
>     ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
>           Sort Key: test_b.id
>           ->  Seq Scan on test_b  (cost=0.00..35.50 rows=2550 width=4)
> (8 rows)
>
> Also, how would that work with duplicate rows in the referenced table?

Right, I neglected to mention that the omitted table must also be unique
on the join key:

decibel@decina.attlocal=# create table a(a_id serial primary key); CREATE
TABLE decibel@decina.attlocal=# create table b(a_id int); CREATE TABLE
decibel@decina.attlocal=# explain analyze select a.* from a left join b
using(a_id);
                                                 QUERY PLAN

--------------------------------------------------------------------------
---------------------------------
  Hash Right Join  (cost=67.38..137.94 rows=2550 width=4) (actual
time=0.035..0.035 rows=0 loops=1)
    Hash Cond: (b.a_id = a.a_id)
    ->  Seq Scan on b  (cost=0.00..35.50 rows=2550 width=4) (never
executed)
    ->  Hash  (cost=35.50..35.50 rows=2550 width=4) (actual
time=0.002..0.002 rows=0 loops=1)
          Buckets: 4096  Batches: 1  Memory Usage: 32kB
          ->  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
  Planning time: 0.380 ms
  Execution time: 0.086 ms
(8 rows)

decibel@decina.attlocal=# alter table b add primary key(a_id); ALTER TABLE
decibel@decina.attlocal=# explain analyze select a.* from a left join b
using(a_id);
                                           QUERY PLAN

--------------------------------------------------------------------------
---------------------
  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
  Planning time: 0.247 ms
  Execution time: 0.029 ms
(3 rows)

decibel@decina.attlocal=# alter table a drop constraint a_pkey; ALTER
TABLE decibel@decina.attlocal=# explain analyze select a.* from a left
join b using(a_id);
                                           QUERY PLAN

--------------------------------------------------------------------------
---------------------
  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
  Planning time: 0.098 ms
  Execution time: 0.011 ms
(3 rows)
--
Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it
in Treble! http://BlueTreble.com
Aggregate  (cost=338203.81..338203.82 rows=1 width=8)
   ->  Nested Loop  (cost=51266.84..338203.81 rows=1 width=8)
         Join Filter: (s_f_touchpoint_execution_status_history_1.touchpoint_execution_id = s_f_touchpoint_execution_status_history.touchpoint_execution_id)
         ->  Nested Loop  (cost=32782.19..317837.83 rows=1 width=32)
               ->  Nested Loop  (cost=32781.90..317837.50 rows=1 width=40)
                     ->  Nested Loop  (cost=32781.61..317829.18 rows=1 width=24)
                           Join Filter: (base.touchpoint_execution_id = s_f_touchpoint_execution_status_history_1.touchpoint_execution_id)
                           ->  Unique  (cost=32781.61..34495.50 rows=1 width=8)
                                 ->  Merge Join  (cost=32781.61..34495.50 rows=1 width=8)
                                       Merge Cond: ((s_f_touchpoint_execution_status_history_1.touchpoint_execution_id = s_f_touchpoint_execution_status_history_1_1.tou
chpoint_execution_id) AND (s_f_touchpoint_execution_status_history_1.creation_dt = (max(s_f_touchpoint_execution_status_history_1_1.creation_dt))))
                                       ->  Sort  (cost=19697.87..20098.14 rows=160107 width=16)
                                             Sort Key: s_f_touchpoint_execution_status_history_1.touchpoint_execution_id, s_f_touchpoint_execution_status_history_1.crea
tion_dt
                                             ->  Seq Scan on s_f_touchpoint_execution_status_history s_f_touchpoint_execution_status_history_1  (cost=0.00..5857.68 rows
=160107 width=16)
                                                   Filter: (touchpoint_execution_status_type_id = ANY ('{3,4,6}'::integer[]))
                                       ->  Sort  (cost=13083.74..13254.76 rows=68410 width=16)
                                             Sort Key: s_f_touchpoint_execution_status_history_1_1.touchpoint_execution_id, (max(s_f_touchpoint_execution_status_history
_1_1.creation_dt))
                                             ->  HashAggregate  (cost=6221.56..6905.66 rows=68410 width=16)
                                                   ->  Seq Scan on s_f_touchpoint_execution_status_history s_f_touchpoint_execution_status_history_1_1  (cost=0.00..4766
.04 rows=291104 width=16)
                           ->  Seq Scan on s_f_promotion_history base  (cost=0.00..283333.66 rows=1 width=32)
                                 Filter: ((send_dt >= '2014-03-13 00:00:00'::timestamp without time zone) AND (send_dt <= '2015-03-14 00:00:00'::timestamp without time
zone))
                     ->  Index Scan using idx_s_d_touchpoint_execution_id on s_d_touchpoint_execution tpe  (cost=0.29..8.31 rows=1 width=16)
                           Index Cond: (touchpoint_execution_id = s_f_touchpoint_execution_status_history_1.touchpoint_execution_id)
               ->  Index Only Scan using s_d_touchpoint_pkey on s_d_touchpoint tp  (cost=0.29..0.32 rows=1 width=8)
                     Index Cond: ((touchpoint_id = tpe.touchpoint_id) AND (channel_type_id = 1))
         ->  Nested Loop  (cost=18484.65..20365.97 rows=1 width=24)
               ->  Hash Join  (cost=18484.52..20365.80 rows=1 width=26)
                     Hash Cond: ((s_f_touchpoint_execution_status_history_1_2.touchpoint_execution_id = s_f_touchpoint_execution_status_history.touchpoint_execution_id)
 AND ((max(s_f_touchpoint_execution_status_history_1_2.creation_dt)) = s_f_touchpoint_execution_status_history.creation_dt))
                     ->  HashAggregate  (cost=6221.56..6905.66 rows=68410 width=16)
                           ->  Seq Scan on s_f_touchpoint_execution_status_history s_f_touchpoint_execution_status_history_1_2  (cost=0.00..4766.04 rows=291104 width=16
)
                     ->  Hash  (cost=12262.93..12262.93 rows=2 width=26)
                           ->  Hash Join  (cost=6259.57..12262.93 rows=2 width=26)
                                 Hash Cond: (s_f_touchpoint_execution_status_history.touchpoint_execution_id = tp_exec.touchpoint_execution_id)
                                 ->  Seq Scan on s_f_touchpoint_execution_status_history  (cost=0.00..5493.80 rows=135878 width=16)
                                       Filter: (touchpoint_execution_status_type_id = ANY ('{3,4}'::integer[]))
                                 ->  Hash  (cost=6259.55..6259.55 rows=1 width=10)
                                       ->  Nested Loop Left Join  (cost=1955.27..6259.55 rows=1 width=10)
                                             ->  Nested Loop  (cost=1954.99..6259.24 rows=1 width=18)
                                                   ->  Nested Loop  (cost=1954.71..6258.92 rows=1 width=34)
                                                         Join Filter: (camp_exec.campaign_id = wave.campaign_id)
                                                         ->  Nested Loop  (cost=1954.42..6254.67 rows=13 width=42)
                                                               ->  Hash Join  (cost=1954.13..6249.67 rows=13 width=42)
                                                                     Hash Cond: ((tp_exec.touchpoint_id = tp_1.touchpoint_id) AND (wave_exec.wave_id = tp_1.wave_id))
                                                                     ->  Hash Join  (cost=1576.83..4595.51 rows=72956 width=40)
                                                                           Hash Cond: (tp_exec.wave_execution_id = wave_exec.wave_execution_id)
                                                                           ->  Seq Scan on s_d_touchpoint_execution tp_exec  (cost=0.00..1559.56 rows=72956 width=34)
                                                                           ->  Hash  (cost=1001.37..1001.37 rows=46037 width=24)
                                                                                 ->  Seq Scan on s_d_wave_execution wave_exec  (cost=0.00..1001.37 rows=46037 width=24)
                                                                     ->  Hash  (cost=212.72..212.72 rows=10972 width=18)
                                                                           ->  Seq Scan on s_d_touchpoint tp_1  (cost=0.00..212.72 rows=10972 width=18)
                                                               ->  Index Scan using s_d_campaign_execution_idx on s_d_campaign_execution camp_exec  (cost=0.29..0.37 row
s=1 width=18)
                                                                     Index Cond: (campaign_execution_id = wave_exec.campaign_execution_id)
                                                         ->  Index Scan using s_d_wave_pkey on s_d_wave wave  (cost=0.29..0.31 rows=1 width=16)
                                                               Index Cond: (wave_id = wave_exec.wave_id)
                                                   ->  Index Scan using s_d_campaign_pkey on s_d_campaign camp  (cost=0.29..0.32 rows=1 width=10)
                                                         Index Cond: (campaign_id = camp_exec.campaign_id)
                                             ->  Index Only Scan using s_d_content_pkey on s_d_content content  (cost=0.28..0.30 rows=1 width=8)
                                                   Index Cond: (content_id = tp_exec.content_id)
               ->  Index Only Scan using d_channel_pk on s_d_channel_type channel  (cost=0.13..0.15 rows=1 width=2)
                     Index Cond: (channel_type_id = tp_1.channel_type_id)
-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance