Mark Roberts napsal(a):
On Tue, 2008-11-18 at 17:38 +0100, tv@xxxxxxxx wrote:
I bet there is no 'critical' length - this is just another case of
index
scan vs. seqscan. The efficiency depends on the size of the table /
row,
amount of data in the table, variability of the column used in the IN
clause, etc.
Splitting the query with 1000 items into 10 separate queries, the
smaller
queries may be faster but the total time consumed may be actually
higher.
Something like
10 * (time of small query) + (time to combine them) > (time of large
query)
If the performance of the 'split' solution is actually better than the
original query, it just means that the planner does not use index scan
when it actually should. That means that either
(a) the planner is not smart enough
(b) it has not current statistics of the table (run ANALYZE on the
table)
(c) the statistics are not detailed enough (ALTER TABLE ... SET
STATICTICS)
(d) the cost variables are not set properly (do not match the hardware
-
decreate index scan cost / increase seq scan cost)
regards
Tomas
I know that it's much faster (for us) to run many smaller queries than
one large query, and I think that it's primarily because of your reason
a. Most of our problems come from Pg misunderstanding the results of a
join and making a bad plan decision. Batching dramatically reduces the
liklihood of this.
As I already said - even the smartest planner won't work without correct
input data. Have you tried fixing the points (b), (c) and (d)?
Fixing them might improve the planner performance so that you don't need
the batchning at all.
regards
Tomas
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance