Re: Sequential scan instead of index scan

Ioannis Anagnostopoulos <ioannis@xxxxxxxxxx> · Mon, 06 Aug 2012 16:24:57 +0100



    They are random as the data are coming
      from multiple threads that are inserting in the database. I see
      what you say about "linking them", and I may give it a try with
      the date. The other think that "links" them together is the 4
      georef fields, however at that stage I am trying to collect
      statistics on the georefs population of "msg_id" so I don't know
      before hand the values to limit my query on them... Do you think
      an index on "date, msg_id" might do something?

      
      Yiannis

      
      On 06/08/2012 16:16, David Barton wrote:

    
      Hi Yiannis,

      
      Is there anything linking these ids together, or are the
      relatively random?  If they are relatively random, the rows are
      likely to be sprinkled amongst many blocks and so a seq scan is
      the fastest.  I've seen similar problems with indexed queries in a
      multi-tennant database where the data is so fragmented that once
      the record volume hits a certain threshold, Postgres decides to
      table scan rather than use an index.

      
      The query optimiser is unlikely to be able to determine the disk
      locality of 300k rows and so it just takes a punt on a seq scan.

      
      If you added another filter condition on something indexed e.g.
      last week or last month or location or something, you might do
      better if the data does exhibit disk locality.  If the data really
      is scattered, then a seq scan really will be quicker.

      
      Regards, David

      
      On 06/08/12 23:08, Ioannis
        Anagnostopoulos wrote:

      
        Hi, my query is very simple:

        
        select 

                    msg_id,

                    msg_type,

                    ship_pos_messages.pos_georef1,

                    ship_pos_messages.pos_georef2,

                    ship_pos_messages.pos_georef3,

                    ship_pos_messages.pos_georef4,

                    obj_id,

                    ship_speed,

                    ship_heading,

                    ship_course,

                    pos_point

                from 

                    feed_all_y2012m08.ship_pos_messages 

                where 

                    extract('day' from msg_date_rec) = 1

                    AND msg_id = any(ARRAY[7294724,14174174,22254408]);

        
        The msg_id is the pkey on the ship_pos_messages table and
          in this example it is working fast as it uses the pkey
          (primary key index) to make the selection. The expplain anayze
          follows:

          "Result  (cost=0.00..86.16 rows=5 width=117) (actual
          time=128.734..163.319 rows=3 loops=1)"

          "  ->  Append  (cost=0.00..86.16 rows=5 width=117) (actual
          time=128.732..163.315 rows=3 loops=1)"

          "        ->  Seq Scan on ship_pos_messages 
          (cost=0.00..0.00 rows=1 width=100) (actual time=0.001..0.001
          rows=0 loops=1)"

          "              Filter: ((msg_id = ANY
          ('{7294724,14174174,22254408}'::integer[])) AND
          (date_part('day'::text, msg_date_rec) = 1::double precision))"

          "        ->  Seq Scan on ship_a_pos_messages
          ship_pos_messages  (cost=0.00..0.00 rows=1 width=100) (actual
          time=0.000..0.000 rows=0 loops=1)"

          "              Filter: ((msg_id = ANY
          ('{7294724,14174174,22254408}'::integer[])) AND
          (date_part('day'::text, msg_date_rec) = 1::double precision))"

          "        ->  Bitmap Heap Scan on ship_b_std_pos_messages
          ship_pos_messages  (cost=13.41..25.42 rows=1 width=128)
          (actual time=49.127..49.127 rows=0 loops=1)"

          "              Recheck Cond: (msg_id = ANY
          ('{7294724,14174174,22254408}'::integer[]))"

          "              Filter: (date_part('day'::text, msg_date_rec) =
          1::double precision)"

          "              ->  Bitmap Index Scan on
          ship_b_std_pos_messages_pkey  (cost=0.00..13.41 rows=3
          width=0) (actual time=49.125..49.125 rows=0 loops=1)"

          "                    Index Cond: (msg_id = ANY
          ('{7294724,14174174,22254408}'::integer[]))"

          "        ->  Bitmap Heap Scan on ship_b_ext_pos_messages
          ship_pos_messages  (cost=12.80..24.62 rows=1 width=128)
          (actual time=0.029..0.029 rows=0 loops=1)"

          "              Recheck Cond: (msg_id = ANY
          ('{7294724,14174174,22254408}'::integer[]))"

          "              Filter: (date_part('day'::text, msg_date_rec) =
          1::double precision)"

          "              ->  Bitmap Index Scan on
          ship_b_ext_pos_messages_pkey  (cost=0.00..12.80 rows=3
          width=0) (actual time=0.027..0.027 rows=0 loops=1)"

          "                    Index Cond: (msg_id = ANY
          ('{7294724,14174174,22254408}'::integer[]))"

          "        ->  Bitmap Heap Scan on ship_a_pos_messages_wk0
          ship_pos_messages  (cost=24.08..36.12 rows=1 width=128)
          (actual time=79.572..114.152 rows=3 loops=1)"

          "              Recheck Cond: (msg_id = ANY
          ('{7294724,14174174,22254408}'::integer[]))"

          "              Filter: (date_part('day'::text, msg_date_rec) =
          1::double precision)"

          "              ->  Bitmap Index Scan on
          ship_a_pos_messages_wk0_pkey  (cost=0.00..24.08 rows=3
          width=0) (actual time=67.441..67.441 rows=3 loops=1)"

          "                    Index Cond: (msg_id = ANY
          ('{7294724,14174174,22254408}'::integer[]))"

          "Total runtime: 180.146 ms"

        
        I think this is a pretty good plan and quite quick given the
        size of the table (88Million rows at present). However in real
        life the parameter where I search for msg_id is not an array of
        3 ids but of 300.000 or more. It is then that the query forgets
        the plan and goes to sequential scan. Is there any way around?
        Or is this the best I can have?

        
        Kind Regards

        Yiannis