Re: Sequential scan instead of index scan

Ioannis Anagnostopoulos <ioannis@xxxxxxxxxx> · Tue, 07 Aug 2012 13:15:06 +0100



      Offhand I'd have thought that ANALYZE would gather stats on the
date_trunc _expression_ (because it is indexed) and then you should get
something reasonably accurate for a comparison to a constant.
"Reasonably accurate" meaning "not off by two orders of magnitude".
Practically all of your runtime is going into this one indexscan,
and TBH it seems likely you'd be better off with a seqscan there.

			regards, tom lane

    
    You were right, after running ANALYZE on the temp table I eventually
    got the HASH JOIN we were talking about. Here is the plan:

    
    "Hash Join  (cost=379575.54..1507341.18 rows=95142 width=128)
      (actual time=3128.940..634179.270 rows=10495795 loops=1)"

      "  Hash Cond: (feed_all_y2012m08.ship_pos_messages.msg_id =
      tmp_tbl_messages.msg_id)"

      "  ->  Append  (cost=0.00..1073525.24 rows=95142 width=128)
      (actual time=37.157..599002.314 rows=18891614 loops=1)"

      "        ->  Seq Scan on ship_pos_messages  (cost=0.00..0.00
      rows=1 width=100) (actual time=0.001..0.001 rows=0 loops=1)"

      "              Filter: ((date_part('day'::text, msg_date_rec) =
      2::double precision) AND (date_trunc('day'::text, msg_date_rec) =
      '2012-08-02 00:00:00'::timestamp without time zone))"

      "        ->  Seq Scan on ship_a_pos_messages ship_pos_messages 
      (cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000
      rows=0 loops=1)"

      "              Filter: ((date_part('day'::text, msg_date_rec) =
      2::double precision) AND (date_trunc('day'::text, msg_date_rec) =
      '2012-08-02 00:00:00'::timestamp without time zone))"

      "        ->  Index Scan using
      idx_ship_b_std_pos_messages_date_trunc on ship_b_std_pos_messages
      ship_pos_messages  (cost=0.00..48111.95 rows=4323 width=128)
      (actual time=37.156..23782.030 rows=808692 loops=1)"

      "              Index Cond: (date_trunc('day'::text, msg_date_rec)
      = '2012-08-02 00:00:00'::timestamp without time zone)"

      "              Filter: (date_part('day'::text, msg_date_rec) =
      2::double precision)"

      "        ->  Index Scan using
      idx_ship_b_ext_pos_messages_date_trunc on ship_b_ext_pos_messages
      ship_pos_messages  (cost=0.00..1844.30 rows=154 width=128) (actual
      time=42.042..1270.104 rows=28656 loops=1)"

      "              Index Cond: (date_trunc('day'::text, msg_date_rec)
      = '2012-08-02 00:00:00'::timestamp without time zone)"

      "              Filter: (date_part('day'::text, msg_date_rec) =
      2::double precision)"

      "        ->  Index Scan using
      idx_ship_a_pos_messages_wk0_date_trunc on ship_a_pos_messages_wk0
      ship_pos_messages  (cost=0.00..1023568.99 rows=90663 width=128)
      (actual time=51.181..571590.415 rows=18054266 loops=1)"

      "              Index Cond: (date_trunc('day'::text, msg_date_rec)
      = '2012-08-02 00:00:00'::timestamp without time zone)"

      "              Filter: (date_part('day'::text, msg_date_rec) =
      2::double precision)"

      "  ->  Hash  (cost=177590.46..177590.46 rows=12311446 width=8)
      (actual time=3082.762..3082.762 rows=12311446 loops=1)"

      "        Buckets: 524288  Batches: 4  Memory Usage: 120316kB"

      "        ->  Seq Scan on tmp_tbl_messages 
      (cost=0.00..177590.46 rows=12311446 width=8) (actual
      time=0.022..1181.376 rows=12311446 loops=1)"

      "Total runtime: 634764.596 ms"

    
    The time looks reasonable but still quite high for the over night
    job I am need it for (have to run around 30 of those). So since the
    join has 

    been shorted I think I need to do something with the rows difference
    between actual and expected in the:

    
    "        ->  Index Scan using
      idx_ship_a_pos_messages_wk0_date_trunc on ship_a_pos_messages_wk0
      ship_pos_messages  (cost=0.00..1023568.99 rows=90663 width=128)
      (actual time=51.181..571590.415 rows=18054266 loops=1)"

      "              Index Cond: (date_trunc('day'::text, msg_date_rec)
      = '2012-08-02 00:00:00'::timestamp without time zone)"

      "              Filter: (date_part('day'::text, msg_date_rec) =
      2::double precision)"

      
      From what I understand a possible solution is to increase the
        stats target for the particular column(?). Any suggestion there?
        I assume we are talking about the msg_date_rec
        where the index is build uppon.

        Finally, I do understand what you say about the Seq scan.
        However in this case I have consistently about 10min per
        execution while the SeqScan was giving me almost nothing at best
        and usually it was running for so long that

        eventually was causing my server problems...

        
        Kind Regards

        Yiannis