Mark Lewis wrote:
Doing a SELECT with a large list of variables inside an IN runs slowly
on every database we've tested. We've tested mostly in Oracle and
PostgreSQL, and both get very slow very quickly (actually Oracle refuses
to process the query at all after it gets too many bind parameters).
In our case, we have a (potentially quite large) set of external values
that we want to look up in the database. We originally thought that
doing a single select with a large IN clause was the way to go, but then
we did some performance analysis on the optimal batch size (number of
items to include per IN clause), and discovered that for most databases,
the optimal batch size was 1. For PostgreSQL I think it was 2.
So that is for parameterized queries (the batch size?).
In my case, I was concerned about latency between app and database
server, so I try to minimize the number of queries I send to the
database server. (My app servers can be anywhere, they /should/ be
close to the database server, but there are no guarantees and I can't
control it).
The last time I tested for optimal batch size using non-parameterized
queries with same-host database and app, I got a batch size of
approximately 700 IN list elements (again, not variables in that test).
That was on postgres 7.X.Y.
Guess I'll have to try a test where I turn the parameterized statements
into regular statements.
I'm pretty sure it would be a bad idea for me to send one IN list
element at a time in all cases. Even if the query query prep was fast,
the network latency could kill my app.
The moral of the story is that you're probably better off running a
bunch of small selects than in trying to optimize things with one
gargantuan select.
The algorithm currently tries to ensure that IN-lists of not more than
700 elements are sent to the database server, and breaks them into
multiple queries. If it has to break it into at least 3 queries, it
uses parameterized statements for the first 2+ and then a
non-parameterized statement for the last one (which may have a different
number of IN list elements than the prior batches).