Re: Sequential scan instead of index scan

Ioannis Anagnostopoulos <ioannis@xxxxxxxxxx> · Mon, 06 Aug 2012 23:04:10 +0100



    On 06/08/2012 16:34, Tom Lane wrote:

    
      Ioannis Anagnostopoulos <ioannis@xxxxxxxxxx> writes:

      
                I think this is a pretty good plan and quite quick given the
        size of the table (88Million rows at present). However in real
        life the parameter where I search for msg_id is not an array of
        3 ids but of 300.000 or more. It is then that the query forgets
        the plan and goes to sequential scan. Is there any way around?

      
      If you've got that many, any(array[....]) is a bad choice.  I'd try
putting the IDs into a VALUES(...) list, or even a temporary table, and
then writing the query as a join.  It is a serious mistake to think that
a seqscan is evil when you're dealing with joining that many rows, btw.
What you should probably be looking for is a hash join plan.

			regards, tom lane

    
    Ok in that scenario we are back to square one. Following your
    suggestion my resultant query is this (the temporary table is
    tmp_tbl_messages)

    select 

                ship_pos_messages.*

            from 

                feed_all_y2012m08.ship_pos_messages join
    tmp_tbl_messages on (ship_pos_messages.msg_id =
    tmp_tbl_messages.msg_id)

            where 

                extract('day' from msg_date_rec) = 1

                AND date_trunc('day', msg_date_rec) = '2012-08-01'; 

    
    which gives us the following explain analyse:

    
    "Merge Join  (cost=1214220.48..3818359.46 rows=173574357
      width=128) (actual time=465036.958..479089.731 rows=341190
      loops=1)"

      "  Merge Cond: (feed_all_y2012m08.ship_pos_messages.msg_id =
      tmp_tbl_messages.msg_id)"

      "  ->  Sort  (cost=1178961.70..1179223.51 rows=104725
      width=128) (actual time=464796.971..476579.208 rows=19512873
      loops=1)"

      "        Sort Key: feed_all_y2012m08.ship_pos_messages.msg_id"

      "        Sort Method:  external merge  Disk: 1254048kB"

      "        ->  Append  (cost=0.00..1170229.60 rows=104725
      width=128) (actual time=0.033..438682.971 rows=19512883 loops=1)"

      "              ->  Seq Scan on ship_pos_messages 
      (cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000
      rows=0 loops=1)"

      "                    Filter: ((date_part('day'::text,
      msg_date_rec) = 1::double precision) AND (date_trunc('day'::text,
      msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time
      zone))"

      "              ->  Seq Scan on ship_a_pos_messages
      ship_pos_messages  (cost=0.00..0.00 rows=1 width=100) (actual
      time=0.000..0.000 rows=0 loops=1)"

      "                    Filter: ((date_part('day'::text,
      msg_date_rec) = 1::double precision) AND (date_trunc('day'::text,
      msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time
      zone))"

      "              ->  Index Scan using
      idx_ship_b_std_pos_messages_date_trunc on ship_b_std_pos_messages
      ship_pos_messages  (cost=0.00..58657.09 rows=5269 width=128)
      (actual time=0.032..799.171 rows=986344 loops=1)"

      "                    Index Cond: (date_trunc('day'::text,
      msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time
      zone)"

      "                    Filter: (date_part('day'::text, msg_date_rec)
      = 1::double precision)"

      "              ->  Index Scan using
      idx_ship_b_ext_pos_messages_date_trunc on ship_b_ext_pos_messages
      ship_pos_messages  (cost=0.00..1694.64 rows=141 width=128) (actual
      time=0.026..20.661 rows=26979 loops=1)"

      "                    Index Cond: (date_trunc('day'::text,
      msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time
      zone)"

      "                    Filter: (date_part('day'::text, msg_date_rec)
      = 1::double precision)"

      "              ->  Index Scan using
      idx_ship_a_pos_messages_wk0_date_trunc on ship_a_pos_messages_wk0
      ship_pos_messages  (cost=0.00..1109877.86 rows=99313 width=128)
      (actual time=0.029..435784.376 rows=18499560 loops=1)"

      "                    Index Cond: (date_trunc('day'::text,
      msg_date_rec) = '2012-08-01 00:00:00'::timestamp without time
      zone)"

      "                    Filter: (date_part('day'::text, msg_date_rec)
      = 1::double precision)"

      "  ->  Sort  (cost=35258.79..36087.50 rows=331486 width=8)
      (actual time=239.908..307.576 rows=349984 loops=1)"

      "        Sort Key: tmp_tbl_messages.msg_id"

      "        Sort Method:  quicksort  Memory: 28694kB"

      "        ->  Seq Scan on tmp_tbl_messages  (cost=0.00..4863.86
      rows=331486 width=8) (actual time=0.047..55.227 rows=349984
      loops=1)"

      "Total runtime: 479336.869 ms"

    
    Which is a Merge join and not a hash. Any ideas how to make it a
    hash join?

    
    Kind Regards

    Yiannis