Re: performance question (something to do w/ parameterized

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Mark Lewis wrote:
Doing a SELECT with a large list of variables inside an IN runs slowly
on every database we've tested.  We've tested mostly in Oracle and
PostgreSQL, and both get very slow very quickly (actually Oracle refuses
to process the query at all after it gets too many bind parameters).

In our case, we have a (potentially quite large) set of external values
that we want to look up in the database.  We originally thought that
doing a single select with a large IN clause was the way to go, but then
we did some performance analysis on the optimal batch size (number of
items to include per IN clause), and discovered that for most databases,
the optimal batch size was 1.  For PostgreSQL I think it was 2.

So that is for parameterized queries (the batch size?).

In my case, I was concerned about latency between app and database server, so I try to minimize the number of queries I send to the database server. (My app servers can be anywhere, they /should/ be close to the database server, but there are no guarantees and I can't control it).

The last time I tested for optimal batch size using non-parameterized queries with same-host database and app, I got a batch size of approximately 700 IN list elements (again, not variables in that test).
That was on postgres 7.X.Y.

Guess I'll have to try a test where I turn the parameterized statements into regular statements.

I'm pretty sure it would be a bad idea for me to send one IN list element at a time in all cases. Even if the query query prep was fast, the network latency could kill my app.


The moral of the story is that you're probably better off running a
bunch of small selects than in trying to optimize things with one
gargantuan select.

The algorithm currently tries to ensure that IN-lists of not more than 700 elements are sent to the database server, and breaks them into multiple queries. If it has to break it into at least 3 queries, it uses parameterized statements for the first 2+ and then a non-parameterized statement for the last one (which may have a different number of IN list elements than the prior batches).


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux