Re: Select running slow on Postgres

"Scott Marlowe" <scott.marlowe@xxxxxxxxx> · Wed, 2 Jul 2008 13:07:45 -0600

On Wed, Jul 2, 2008 at 1:01 PM, samantha mahindrakar
<sam.mahindrakar@xxxxxxxxx> wrote:
> I ran the explain analyze.Here is what i got:
>
>
> "Group  (cost=112266.37..112266.40 rows=1 width=56) (actual
> time=5583.399..5615.476 rows=13373 loops=1)"
> "  ->  Sort  (cost=112266.37..112266.38 rows=1 width=56) (actual
> time=5583.382..5590.890 rows=13373 loops=1)"
> "        Sort Key: lane_data_07_08.lane_id,
> lane_data_07_08.measurement_start, lane_data_07_08.measurement_end,
> lane_data_07_08.speed, lane_data_07_08.volume, lane_data_07_08.occupancy,
> lane_data_07_08.quality, lane_data_07_08.effective_date"
> "        ->  Nested Loop IN Join  (cost=0.00..112266.36 rows=1 width=56)
> (actual time=1100.307..5547.768 rows=13373 loops=1)"
> "              ->  Seq Scan on lane_data_07_08  (cost=0.00..112241.52 rows=3
> width=56) (actual time=1087.666..5341.662 rows=20581 loops=1)"

You can see here that the seq scan on lane_data is what's eating up
all your time.  Also, since the row estimate is WAY off, it then chose
a nested loop thinking it would be joining up only 1 row and actually
running across 20k rows.

> "                    Filter: (((volume = 255::double precision) OR (speed =
> 255::double precision) OR (occupancy = 255::double precision) OR (occupancy
>>= 100::double precision) OR (volume > 52::double precision) OR (volume <
> 0::double precision) OR (speed > 120::double precision) OR (speed <
> 0::double precision)) AND (date_part('hour'::text, measurement_start) >=
> 5::double precision) AND (date_part('hour'::text, measurement_start) <=
> 23::double precision) AND (date_part('day'::text, measurement_start) =
> 1::double precision))"
> "              ->  Index Scan using lane_info_pk on
> lane_info  (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.007 rows=1
> loops=20581)"
> "                    Index Cond: (lane_data_07_08.lane_id =
> lane_info.lane_id)"
> "                    Filter: (inactive IS NULL)"
> "Total runtime: 5621.409 ms"
>
>
> Well instaed of creating extra indexes (since they eat up lot of space) i
> made use of the whole measurement_start field, so thet it uses the index
> proeprty and makes the search faster.
> So i changed the query to include the measuerment start as follows:
>
> SELECT lane_id,measurement_start,
> measurement_end,speed,volume,occupancy,quality,effective_date
> FROM tss.lane_data_06_08
> WHERE lane_id in(select lane_id from lane_info where inactive is  null )
> AND measurement_start between '2008-06-30 05:00:00-04' AND  '2008-06-30
> 23:00:00-04'
> GROUP BY
> lane_id,measurement_start,measurement_end,speed,volume,occupancy,quality,effective_date
> ORDER BY lane_id, measurement_start

Yeah, anytime you can just compare date / timestamp on an indexed
field you'll do better.  If you find yourself needing to use the other
syntax, so you can, for instance, grab the data for 5 days in a row
from 5am to 11am or something, then the method I mentioned of making
indexes on date_part are a good choice.  Note that you need regular
timestamp, not timstamptz to create indexes.