Re: strange query plan with LIMIT

Pavel Stehule <pavel.stehule@xxxxxxxxx> · Tue, 7 Jun 2011 12:43:22 +0200

Hello

did you run a ANALYZE statement on table tdiag? A statistics are
absolutelly out.

Regards

Pavel Stehule

2011/6/7  <anthony.shipman@xxxxxxxxxxxxx>:
> Version: PostgreSQL 8.3.5 (mammoth replicator)
>
> Schema:
>
> CREATE TABLE tdiag (
> Â Âdiag_id Â Â Â Â Â Â integer DEFAULT nextval('diag_id_seq'::text),
> Â Âcreate_time Â Â Â Â timestamp with time zone default now(), /* time this record
> was created */
> Â Âdiag_time Â Â Â Â Â timestamp with time zone not null,
> Â Âdevice_id Â Â Â Â Â integer, Â Â Â Â Â Â Â Â/* optional */
> Â Âfleet_id Â Â Â Â Â Âinteger, Â Â Â Â Â Â Â Â/* optional */
> Â Âcustomer_id Â Â Â Â integer, Â Â Â Â Â Â Â Â/* optional */
> Â Âmodule Â Â Â Â Â Â Âcharacter varying,
> Â Ânode_kind Â Â Â Â Â smallint,
> Â Âdiag_level Â Â Â Â Âsmallint,
> Â Âtag Â Â Â Â Â Â Â Â character varying not null default '',
> Â Âmessage Â Â Â Â Â Â character varying not null default '',
> Â Âoptions Â Â Â Â Â Â text,
>
> Â ÂPRIMARY KEY (diag_id)
> );
>
> create index tdiag_create_time Â ON tdiag(create_time);
>
> The number of rows is around 33 million with time stamps over the past two
> weeks.
> A VACUUM ANALYZE has been done recently on the table.
>
> The create_time order is almost identical to the id order. ÂWhat I want
> to find is the first or last entry by id in a given time range. The
> query I am having a problem with is:
>
> symstream2=> explain analyze select * from tdiag where (create_time
>>= '2011-06-03
> 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by
> diag_id limit 1;
>
>
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> ÂLimit Â(cost=0.00..16.75 rows=1 width=114) (actual time=69425.356..69425.358
> rows=1 loops=1)
> Â -> ÂIndex Scan using tdiag_pkey on tdiag Â(cost=0.00..19114765.76
> rows=1141019 width=114)
> (actual time=69425.352..69425.352 rows=1 loops=1)
> Â Â Â Â Filter: ((create_time >= '2011-06-03 19:49:04+10'::timestamp with
> time zone) AND
> (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
> ÂTotal runtime: 69425.400 ms
>
> PG seems to decide it must scan the diag_id column and filter each row by the
> create_time.
>
>
>
> If I leave out the limit I get
>
> symstream2=> explain analyze select * from tdiag where (create_time
>>= '2011-06-03
> 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by
> diag_id;
>
>
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> ÂSort Â(cost=957632.43..960484.98 rows=1141019 width=114) (actual
> time=552.795..656.319 rows=86530
> loops=1)
> Â Sort Key: diag_id
> Â Sort Method: Âexternal merge ÂDisk: 9872kB
> Â -> ÂBitmap Heap Scan on tdiag Â(cost=25763.48..638085.13 rows=1141019
> width=114) (actual
> time=43.232..322.441 rows=86530 loops=1)
> Â Â Â Â Recheck Cond: ((create_time >= '2011-06-03 19:49:04+10'::timestamp
> with time zone) AND
> (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
> Â Â Â Â -> ÂBitmap Index Scan on tdiag_create_time Â(cost=0.00..25478.23
> rows=1141019 width=0)
> (actual time=42.574..42.574 rows=86530 loops=1)
> Â Â Â Â Â Â Â Index Cond: ((create_time >= '2011-06-03
> 19:49:04+10'::timestamp with time zone) AND
> (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
> ÂTotal runtime: 736.440 ms
> (8 rows)
>
>
>
>
> I can be explicit about the query order:
>
> select * into tt from tdiag where (create_time >= '2011-06-03
> 09:49:04.000000+0' and create_time <
> '2011-06-06 09:59:04.000000+0');
>
> symstream2=> explain analyze select * from tt order by diag_id limit 1;
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------
> ÂLimit Â(cost=2731.95..2731.95 rows=1 width=101) (actual time=440.165..440.166
> rows=1 loops=1)
> Â -> ÂSort Â(cost=2731.95..2948.28 rows=86530 width=101) (actual
> time=440.161..440.161 rows=1
> loops=1)
> Â Â Â Â Sort Key: diag_id
> Â Â Â Â Sort Method: Âtop-N heapsort ÂMemory: 17kB
> Â Â Â Â -> ÂSeq Scan on tt Â(cost=0.00..2299.30 rows=86530 width=101) (actual
> time=19.602..330.873
> rows=86530 loops=1)
> ÂTotal runtime: 440.209 ms
> (6 rows)
>
>
>
> But if I try using a subquery I get
>
> symstream2=> explain analyze select * from (select * from tdiag where
> (create_time >= '2011-06-03
> 09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0')) as sub
> order by diag_id limit
> 1;
>
>
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> ÂLimit Â(cost=0.00..16.75 rows=1 width=114) (actual time=90344.384..90344.385
> rows=1 loops=1)
> Â -> ÂIndex Scan using tdiag_pkey on tdiag Â(cost=0.00..19114765.76
> rows=1141019 width=114)
> (actual time=90344.380..90344.380 rows=1 loops=1)
> Â Â Â Â Filter: ((create_time >= '2011-06-03 19:49:04+10'::timestamp with
> time zone) AND
> (create_time < '2011-06-06 19:59:04+10'::timestamp with time zone))
> ÂTotal runtime: 90344.431 ms
>
>
> How do I make this both fast and simple?
> --
> Anthony Shipman Â Â Â Â Â Â Â Â | flailover systems: When one goes down it
> Anthony.Shipman@xxxxxxxxxxxxx Â | flails about until the other goes down too.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance