Hello did you run a ANALYZE statement on table tdiag? A statistics are absolutelly out. Regards Pavel Stehule 2011/6/7 <anthony.shipman@xxxxxxxxxxxxx>: > Version: PostgreSQL 8.3.5 (mammoth replicator) > > Schema: > > CREATE TABLE tdiag ( >  Âdiag_id       integer DEFAULT nextval('diag_id_seq'::text), >  Âcreate_time     timestamp with time zone default now(), /* time this record > was created */ >  Âdiag_time      timestamp with time zone not null, >  Âdevice_id      integer,        Â/* optional */ >  Âfleet_id      Âinteger,        Â/* optional */ >  Âcustomer_id     integer,        Â/* optional */ >  Âmodule       Âcharacter varying, >  Ânode_kind      smallint, >  Âdiag_level     Âsmallint, >  Âtag         character varying not null default '', >  Âmessage       character varying not null default '', >  Âoptions       text, > >  ÂPRIMARY KEY (diag_id) > ); > > create index tdiag_create_time  ON tdiag(create_time); > > The number of rows is around 33 million with time stamps over the past two > weeks. > A VACUUM ANALYZE has been done recently on the table. > > The create_time order is almost identical to the id order. ÂWhat I want > to find is the first or last entry by id in a given time range. The > query I am having a problem with is: > > symstream2=> explain analyze select * from tdiag where (create_time >>= '2011-06-03 > 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by > diag_id limit 1; > > > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------------------------- > ÂLimit Â(cost=0.00..16.75 rows=1 width=114) (actual time=69425.356..69425.358 > rows=1 loops=1) >  -> ÂIndex Scan using tdiag_pkey on tdiag Â(cost=0.00..19114765.76 > rows=1141019 width=114) > (actual time=69425.352..69425.352 rows=1 loops=1) >     Filter: ((create_time >= '2011-06-03 19:49:04+10'::timestamp with > time zone) AND > (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone)) > ÂTotal runtime: 69425.400 ms > > PG seems to decide it must scan the diag_id column and filter each row by the > create_time. > > > > If I leave out the limit I get > > symstream2=> explain analyze select * from tdiag where (create_time >>= '2011-06-03 > 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by > diag_id; > > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > ÂSort Â(cost=957632.43..960484.98 rows=1141019 width=114) (actual > time=552.795..656.319 rows=86530 > loops=1) >  Sort Key: diag_id >  Sort Method: Âexternal merge ÂDisk: 9872kB >  -> ÂBitmap Heap Scan on tdiag Â(cost=25763.48..638085.13 rows=1141019 > width=114) (actual > time=43.232..322.441 rows=86530 loops=1) >     Recheck Cond: ((create_time >= '2011-06-03 19:49:04+10'::timestamp > with time zone) AND > (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone)) >     -> ÂBitmap Index Scan on tdiag_create_time Â(cost=0.00..25478.23 > rows=1141019 width=0) > (actual time=42.574..42.574 rows=86530 loops=1) >        Index Cond: ((create_time >= '2011-06-03 > 19:49:04+10'::timestamp with time zone) AND > (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone)) > ÂTotal runtime: 736.440 ms > (8 rows) > > > > > I can be explicit about the query order: > > select * into tt from tdiag where (create_time >= '2011-06-03 > 09:49:04.000000+0' and create_time < > '2011-06-06 09:59:04.000000+0'); > > symstream2=> explain analyze select * from tt order by diag_id limit 1; >                            QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------ > ÂLimit Â(cost=2731.95..2731.95 rows=1 width=101) (actual time=440.165..440.166 > rows=1 loops=1) >  -> ÂSort Â(cost=2731.95..2948.28 rows=86530 width=101) (actual > time=440.161..440.161 rows=1 > loops=1) >     Sort Key: diag_id >     Sort Method: Âtop-N heapsort ÂMemory: 17kB >     -> ÂSeq Scan on tt Â(cost=0.00..2299.30 rows=86530 width=101) (actual > time=19.602..330.873 > rows=86530 loops=1) > ÂTotal runtime: 440.209 ms > (6 rows) > > > > But if I try using a subquery I get > > symstream2=> explain analyze select * from (select * from tdiag where > (create_time >= '2011-06-03 > 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0')) as sub > order by diag_id limit > 1; > > > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------------------------- > ÂLimit Â(cost=0.00..16.75 rows=1 width=114) (actual time=90344.384..90344.385 > rows=1 loops=1) >  -> ÂIndex Scan using tdiag_pkey on tdiag Â(cost=0.00..19114765.76 > rows=1141019 width=114) > (actual time=90344.380..90344.380 rows=1 loops=1) >     Filter: ((create_time >= '2011-06-03 19:49:04+10'::timestamp with > time zone) AND > (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone)) > ÂTotal runtime: 90344.431 ms > > > How do I make this both fast and simple? > -- > Anthony Shipman         | flailover systems: When one goes down it > Anthony.Shipman@xxxxxxxxxxxxx  | flails about until the other goes down too. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance