Your answer seemed to get the point. index on telegram_id(type=integer) column can't be used for the filter condition below because type mismatches. ((telegram_id)::numeric = ANY ('{66484,132362}'::numeric[]))" ________________________________ > Date: Thu, 15 May 2014 17:31:10 +0900 > Subject: Re: FW: [PERFORM] how do functions affect query plan? > From: silent0608@xxxxxxxxx > To: pgsql-performance@xxxxxxxxxxxxxx > > hi > > i think the telegram_id's type should be integer. > > please change telegram_id to numeric and try to run the the following > sql. the index should be used. > > explain SELECT md.* > FROM measure_data md > where telegram_id in (trunc(66484.2),trunc(132362.1 )) > > > 2014-05-15 17:28 GMT+09:00 changchao > <chang-chao@xxxxxxxxxxx<mailto:chang-chao@xxxxxxxxxxx>>: > > > ---------------------------------------- > > From: chang-chao@xxxxxxxxxxx<mailto:chang-chao@xxxxxxxxxxx> > > To: > pgsql-performance@xxxxxxxxxxxxxx<mailto:pgsql-performance@xxxxxxxxxxxxxx> > > Subject: Re: [PERFORM] how do functions affect query plan? > > Date: Thu, 15 May 2014 16:59:30 +0900 > > > > > > > > Interestingly,adding type cast made postgresql wiser. > > Anyone knows the reason? > > > > 1.no<http://1.no> type cast > > SELECT md.* > > FROM measure_data md > > where telegram_id in (trunc(66484.2),trunc(132362.1 )) > > > > > > "Seq Scan on measure_data md (cost=0.00..459455.40 rows=205546 > width=28) (actual time=77.144..6458.870 rows=624 loops=1)" > > " Filter: ((telegram_id)::numeric = ANY ('{66484,132362}'::numeric[]))" > > " Rows Removed by Filter: 20553936" > > "Total runtime: 6458.921 ms" > > > > > > 2.type cast > > > > SELECT md.* > > FROM measure_data md > > where telegram_id in (trunc(66484.2)::int,trunc(132362.1 )::int) > > > > "Bitmap Heap Scan on measure_data md (cost=16.06..2618.86 rows=684 > width=28) (actual time=0.076..0.154 rows=624 loops=1)" > > " Recheck Cond: (telegram_id = ANY ('{66484,132362}'::integer[]))" > > " -> Bitmap Index Scan on index_measure_data_telegram_id > (cost=0.00..15.88 rows=684 width=0) (actual time=0.065..0.065 rows=624 > loops=1)" > > " Index Cond: (telegram_id = ANY ('{66484,132362}'::integer[]))" > > "Total runtime: 0.187 ms" > > > > > > ---------------------------------------- > >> From: chang-chao@xxxxxxxxxxx<mailto:chang-chao@xxxxxxxxxxx> > >> To: david.g.johnston@xxxxxxxxx<mailto:david.g.johnston@xxxxxxxxx>; > pgsql-performance@xxxxxxxxxxxxxx<mailto:pgsql-performance@xxxxxxxxxxxxxx> > >> Subject: Re: [PERFORM] how do functions affect query plan? > >> Date: Thu, 15 May 2014 15:19:13 +0900 > >> > >> Hi,David > >> > >> Seems that the root of evil is in the function(random,trunc), > >> although I don't know why. > >> > >> Here is the comparison. > >> > >> 1.w/o function : index is wisely used.(Even without the limit 30 clause) > >> > >> explain analyze > >> SELECT md.* > >> FROM measure_data md > >> where telegram_id in > >> ( > >> SELECT 66484 + (132363-66484)/30 * i > >> FROM generate_series(1,30) as s(i) > >> limit 30 > >> ) > >> ; > >> > >> "Nested Loop (cost=10.01..39290.79 rows=10392 width=28) (actual > time=0.079..3.490 rows=9360 loops=1)" > >> " -> HashAggregate (cost=0.83..1.13 rows=30 width=4) (actual > time=0.027..0.032 rows=30 loops=1)" > >> " -> Limit (cost=0.00..0.45 rows=30 width=4) (actual > time=0.013..0.020 rows=30 loops=1)" > >> " -> Function Scan on generate_series s (cost=0.00..15.00 rows=1000 > width=4) (actual time=0.011..0.016 rows=30 loops=1)" > >> " -> Bitmap Heap Scan on measure_data md (cost=9.19..1306.20 > rows=346 width=28) (actual time=0.030..0.075 rows=312 loops=30)" > >> " Recheck Cond: (telegram_id = ((66484 + (2195 * s.i))))" > >> " -> Bitmap Index Scan on index_measure_data_telegram_id > (cost=0.00..9.10 rows=346 width=0) (actual time=0.025..0.025 rows=312 > loops=30)" > >> " Index Cond: (telegram_id = ((66484 + (2195 * s.i))))" > >> "Total runtime: 3.714 ms" > >> > >> > >> 2.when function is there: seq scan > >> > >> explain analyze > >> SELECT md.* > >> FROM measure_data md > >> where telegram_id in > >> ( > >> SELECT trunc((132363-66484) * random()) +66484 > >> FROM generate_series(1,30) as s(i) > >> limit 30 > >> ) > >> ; > >> > >> > >> "Hash Join (cost=1.65..490288.89 rows=10277280 width=28) (actual > time=0.169..4894.847 rows=9360 loops=1)" > >> " Hash Cond: ((md.telegram_id)::double precision = > ((trunc((65879::double precision * random())) + 66484::double > precision)))" > >> " -> Seq Scan on measure_data md (cost=0.00..356682.60 rows=20554560 > width=28) (actual time=0.010..2076.932 rows=20554560 loops=1)" > >> " -> Hash (cost=1.28..1.28 rows=30 width=8) (actual > time=0.041..0.041 rows=30 loops=1)" > >> " Buckets: 1024 Batches: 1 Memory Usage: 2kB" > >> " -> HashAggregate (cost=0.98..1.28 rows=30 width=8) (actual > time=0.034..0.036 rows=30 loops=1)" > >> " -> Limit (cost=0.00..0.60 rows=30 width=0) (actual > time=0.016..0.026 rows=30 loops=1)" > >> " -> Function Scan on generate_series s (cost=0.00..20.00 rows=1000 > width=0) (actual time=0.015..0.023 rows=30 loops=1)" > >> "Total runtime: 4895.239 ms" > >> > >> > >> ---------------------------------------- > >>> Date: Wed, 14 May 2014 22:43:24 -0700 > >>> From: david.g.johnston@xxxxxxxxx<mailto:david.g.johnston@xxxxxxxxx> > >>> To: > pgsql-performance@xxxxxxxxxxxxxx<mailto:pgsql-performance@xxxxxxxxxxxxxx> > >>> Subject: Re: [PERFORM] how do functions affect query plan? > >>> > >>> 常超 wrote > >>>> Hi,all > >>>> I have a table to save received measure data. > >>>> > >>>> > >>>> CREATE TABLE measure_data > >>>> ( > >>>> id serial NOT NULL, > >>>> telegram_id integer NOT NULL, > >>>> measure_time timestamp without time zone NOT NULL, > >>>> item_id integer NOT NULL, > >>>> val double precision, > >>>> CONSTRAINT measure_data_pkey PRIMARY KEY (id) > >>>> ); > >>>> > >>>> CREATE INDEX index_measure_data_telegram_id ON measure_data USING btree > >>>> (telegram_id); > >>>> > >>>> > >>>> in my scenario,a telegram contains measure data for multiple data items > >>>> and timestamps, > >>>> BTW,another table is for telegram. > >>>> > >>>> The SQL I used in my application is > >>>> select * from measure_data where telegram_id in(1,2,...,n) > >>>> and this query used the index_measure_data_telegram_id index,as > expected. > >>>> > >>>> In order to see the performance of my query , > >>>> I used the following query to search the measure data for randomly 30 > >>>> telegrams. > >>>> > >>>> > >>>> explain analyze > >>>> SELECT md.* > >>>> FROM measure_data md > >>>> where telegram_id in > >>>> ( > >>>> SELECT distinct > >>>> trunc((132363-66484) * random() + 66484) > >>>> FROM generate_series(1,30) as s(telegram_id) > >>>> ) > >>>> ; > >>>> > >>>> the 132363 and 66484 are the max and min of the telegram id,separately. > >>>> > >>>> What surprised me is that index is not used,instead,a seq scan is > >>>> performed on measure_data. > >>>> Although,intuitively,in this case,it is much wiser to use the index. > >>>> Would you please give some clue to why this happened? > >>>> > >>>> "Hash Semi Join (cost=65.00..539169.32 rows=10277280 width=28) (actual > >>>> time=76.454..17177.054 rows=9360 loops=1)" > >>>> " Hash Cond: ((md.telegram_id)::double precision = > (trunc(((65879::double > >>>> precision * random()) + 66484::double precision))))" > >>>> " -> Seq Scan on measure_data md (cost=0.00..356682.60 rows=20554560 > >>>> width=28) (actual time=0.012..13874.809 rows=20554560 loops=1)" > >>>> " -> Hash (cost=52.50..52.50 rows=1000 width=8) (actual > >>>> time=0.062..0.062 rows=30 loops=1)" > >>>> " Buckets: 1024 Batches: 1 Memory Usage: 2kB" > >>>> " -> HashAggregate (cost=22.50..42.50 rows=1000 width=0) (actual > >>>> time=0.048..0.053 rows=30 loops=1)" > >>>> " -> Function Scan on generate_series s (cost=0.00..20.00 > >>>> rows=1000 width=0) (actual time=0.020..0.034 rows=30 loops=1)" > >>>> "Total runtime: 17177.527 ms" > >>> > >>> The planner expects to need to return half the table when you > provide 1,000 > >>> distinct telegram_ids, which is best handled by scanning the whole table > >>> sequentially and tossing out invalid data. > >>> > >>> I am curious if the plan will be different if you added a LIMIT 30 to the > >>> sub-query. > >>> > >>> The root of the problem is the planner has no way of knowing whether > >>> generate_series is going to return 1 or 1,000,000 rows so by > default it (and > >>> all functions) are assumed (by the planner) to return 1,000 rows. > By adding > >>> an explicit limit you can better inform the planner as to how many > rows you > >>> are going to be passing up to the parent query and it will > hopefully, with > >>> knowledge of only 30 distinct values, use the index. > >>> > >>> > >>> > >>> > >>> -- > >>> View this message in context: > http://postgresql.1045698.n5.nabble.com/how-do-functions-affect-query-plan-tp5803993p5803996.html > >>> Sent from the PostgreSQL - performance mailing list archive at > Nabble.com. > >>> > >>> > >>> -- > >>> Sent via pgsql-performance mailing list > (pgsql-performance@xxxxxxxxxxxxxx<mailto:pgsql-performance@xxxxxxxxxxxxxx>) > >>> To make changes to your subscription: > >>> http://www.postgresql.org/mailpref/pgsql-performance > >> > >> -- > >> Sent via pgsql-performance mailing list > (pgsql-performance@xxxxxxxxxxxxxx<mailto:pgsql-performance@xxxxxxxxxxxxxx>) > >> To make changes to your subscription: > >> http://www.postgresql.org/mailpref/pgsql-performance > > > > -- > > Sent via pgsql-performance mailing list > (pgsql-performance@xxxxxxxxxxxxxx<mailto:pgsql-performance@xxxxxxxxxxxxxx>) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-performance > >