Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/7/20 10:49 AM, Amarendra Konda wrote:
Hi David,

Thanks for the reply.This has optimized number of rows.

Yeah, but your execution time has increased an order of magnitude. Not sure if that is what you want.


Can you please explain, why it is getting more columns in output, even though we have asked for only one column ?


 EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)  SELECT pa.process_activity_id AS pa_process_activity_id  FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00'  AND EXISTS ( SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id  AND pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,m.created limit 50;

    QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------
 Limit  (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629 rows=50 loops=1)
    Output: pa.process_activity_id, pa.process_instance_id, pa.created
    Buffers: shared hit=274950
   ->  Nested Loop Semi Join  (cost=1.14..266660108.78 rows=367790473 width=24) (actual time=821.282..891.607 rows=50 loops=1)
          Output: pa.process_activity_id, pa.process_instance_id, pa.created
          Buffers: shared hit=274950
         ->  Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa  (cost=0.70..262062725.21 rows=367790473 width=32) (actual time=821.253..891.517 rows=50 loops=1) * Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id, m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translat
e, pa.in_reply_to*
               Index Cond: ((m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
                Buffers: shared hit=274946
         ->  Materialize  (cost=0.43..2.66 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=50)
                Output: pi.app_id
                Buffers: shared hit=4
               ->  Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi  (cost=0.43..2.66 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=1)
                      Output: pi.app_id
                      Index Cond: (pi.user_id = '137074931866340'::bigint)
                      Filter: (pi.app_id = '126502930200650'::bigint)
                      Buffers: shared hit=4
  Planning time: 0.297 ms
  Execution time: 891.686 ms
(20 rows)

On Thu, May 7, 2020 at 9:17 PM David G. Johnston <david.g.johnston@xxxxxxxxx <mailto:david.g.johnston@xxxxxxxxx>> wrote:

    On Thu, May 7, 2020 at 7:40 AM Adrian Klaver
    <adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>> wrote:

        On 5/7/20 4:19 AM, Amarendra Konda wrote:
         > Hi,
         >
         > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu,
        compiled
         > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
         >
         > We have noticed huge difference interms of execution plan (
        response
         > time) , When we pass the direct values  Vs  inner query to IN
        clause.
         >
         > High level details of the use case are as follows
         >
         >   * As part of the SQL there are 2 tables named Process_instance
         >     (master) and Process_activity ( child)
         >   * Wanted to fetch TOP 50 rows from  Process_activity table
        for the
         >     given values of the Process_instance.
         >   * When we used Inner Join / Inner query ( query1)  between
        parent
         >     table and child table , LIMIT is not really taking in to
        account.
         >     Instead it is fetching more rows and columns that
        required, and
         >     finally limiting the result

        It is doing what you told it to do which is SELECT all
        process_instance_i's for user_id='317079413683604' and app_id =
        '427380312000560' and then filtering further. I am going to
        guess that
        if you run the inner query alone you will find it returns ~23496
        rows.
        You might have better results if you an actual join between
        process_activity and process_instance. Something like
        below(obviously
        not tested):


    What the OP seems to want is a semi-join:

    (not tested)

    SELECT pa.process_activity_id
    FROM process_activity pa WHERE pa.app_id = '427380312000560' AND
    pa.created > '1970-01-01 00:00:00'
    ANDEXISTS (
       SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND
    pi.user_id = '317079413683604'
    )
    ORDER BY
    pa.process_instance_id,
    pa.created limit 50;

    I'm unsure exactly how this will impact the plan choice but it
    should be an improvement, and in any case more correctly defines
    what it is you are looking for.

    David J.



--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx





[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux