Re: Sequential scan instead of index scan

David Barton <dave@xxxxxxxxxxxx> · Mon, 06 Aug 2012 23:16:29 +0800



    Hi Yiannis,

    
    Is there anything linking these ids together, or are the relatively
    random?  If they are relatively random, the rows are likely to be
    sprinkled amongst many blocks and so a seq scan is the fastest. 
    I've seen similar problems with indexed queries in a multi-tennant
    database where the data is so fragmented that once the record volume
    hits a certain threshold, Postgres decides to table scan rather than
    use an index.

    
    The query optimiser is unlikely to be able to determine the disk
    locality of 300k rows and so it just takes a punt on a seq scan.

    
    If you added another filter condition on something indexed e.g. last
    week or last month or location or something, you might do better if
    the data does exhibit disk locality.  If the data really is
    scattered, then a seq scan really will be quicker.

    
    Regards, David

    
    On 06/08/12 23:08, Ioannis
      Anagnostopoulos wrote:

    
      Hi, my query is very simple:

      
      select 

                  msg_id,

                  msg_type,

                  ship_pos_messages.pos_georef1,

                  ship_pos_messages.pos_georef2,

                  ship_pos_messages.pos_georef3,

                  ship_pos_messages.pos_georef4,

                  obj_id,

                  ship_speed,

                  ship_heading,

                  ship_course,

                  pos_point

              from 

                  feed_all_y2012m08.ship_pos_messages 

              where 

                  extract('day' from msg_date_rec) = 1

                  AND msg_id = any(ARRAY[7294724,14174174,22254408]);

      
      The msg_id is the pkey on the ship_pos_messages table and
        in this example it is working fast as it uses the pkey (primary
        key index) to make the selection. The expplain anayze follows:

        "Result  (cost=0.00..86.16 rows=5 width=117) (actual
        time=128.734..163.319 rows=3 loops=1)"

        "  ->  Append  (cost=0.00..86.16 rows=5 width=117) (actual
        time=128.732..163.315 rows=3 loops=1)"

        "        ->  Seq Scan on ship_pos_messages  (cost=0.00..0.00
        rows=1 width=100) (actual time=0.001..0.001 rows=0 loops=1)"

        "              Filter: ((msg_id = ANY
        ('{7294724,14174174,22254408}'::integer[])) AND
        (date_part('day'::text, msg_date_rec) = 1::double precision))"

        "        ->  Seq Scan on ship_a_pos_messages
        ship_pos_messages  (cost=0.00..0.00 rows=1 width=100) (actual
        time=0.000..0.000 rows=0 loops=1)"

        "              Filter: ((msg_id = ANY
        ('{7294724,14174174,22254408}'::integer[])) AND
        (date_part('day'::text, msg_date_rec) = 1::double precision))"

        "        ->  Bitmap Heap Scan on ship_b_std_pos_messages
        ship_pos_messages  (cost=13.41..25.42 rows=1 width=128) (actual
        time=49.127..49.127 rows=0 loops=1)"

        "              Recheck Cond: (msg_id = ANY
        ('{7294724,14174174,22254408}'::integer[]))"

        "              Filter: (date_part('day'::text, msg_date_rec) =
        1::double precision)"

        "              ->  Bitmap Index Scan on
        ship_b_std_pos_messages_pkey  (cost=0.00..13.41 rows=3 width=0)
        (actual time=49.125..49.125 rows=0 loops=1)"

        "                    Index Cond: (msg_id = ANY
        ('{7294724,14174174,22254408}'::integer[]))"

        "        ->  Bitmap Heap Scan on ship_b_ext_pos_messages
        ship_pos_messages  (cost=12.80..24.62 rows=1 width=128) (actual
        time=0.029..0.029 rows=0 loops=1)"

        "              Recheck Cond: (msg_id = ANY
        ('{7294724,14174174,22254408}'::integer[]))"

        "              Filter: (date_part('day'::text, msg_date_rec) =
        1::double precision)"

        "              ->  Bitmap Index Scan on
        ship_b_ext_pos_messages_pkey  (cost=0.00..12.80 rows=3 width=0)
        (actual time=0.027..0.027 rows=0 loops=1)"

        "                    Index Cond: (msg_id = ANY
        ('{7294724,14174174,22254408}'::integer[]))"

        "        ->  Bitmap Heap Scan on ship_a_pos_messages_wk0
        ship_pos_messages  (cost=24.08..36.12 rows=1 width=128) (actual
        time=79.572..114.152 rows=3 loops=1)"

        "              Recheck Cond: (msg_id = ANY
        ('{7294724,14174174,22254408}'::integer[]))"

        "              Filter: (date_part('day'::text, msg_date_rec) =
        1::double precision)"

        "              ->  Bitmap Index Scan on
        ship_a_pos_messages_wk0_pkey  (cost=0.00..24.08 rows=3 width=0)
        (actual time=67.441..67.441 rows=3 loops=1)"

        "                    Index Cond: (msg_id = ANY
        ('{7294724,14174174,22254408}'::integer[]))"

        "Total runtime: 180.146 ms"

      
      I think this is a pretty good plan and quite quick given the size
      of the table (88Million rows at present). However in real life the
      parameter where I search for msg_id is not an array of 3 ids but
      of 300.000 or more. It is then that the query forgets the plan and
      goes to sequential scan. Is there any way around? Or is this the
      best I can have?

      
      Kind Regards

      Yiannis