On Mon, May 08, 2006 at 12:50:13PM -0500, Jim C. Nasby wrote: > On Mon, May 08, 2006 at 10:42:21AM -0700, Mark Lewis wrote: > > Doing a SELECT with a large list of variables inside an IN runs slowly > > on every database we've tested. We've tested mostly in Oracle and > > PostgreSQL, and both get very slow very quickly (actually Oracle refuses > > to process the query at all after it gets too many bind parameters). > > > > In our case, we have a (potentially quite large) set of external values > > that we want to look up in the database. We originally thought that > > doing a single select with a large IN clause was the way to go, but then > > we did some performance analysis on the optimal batch size (number of > > items to include per IN clause), and discovered that for most databases, > > the optimal batch size was 1. For PostgreSQL I think it was 2. > > > > The moral of the story is that you're probably better off running a > > bunch of small selects than in trying to optimize things with one > > gargantuan select. > > Ever experiment with loading the parameters into a temp table and > joining to that? > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@xxxxxxxxxxxxx > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > ---------------------------(end of broadcast)--------------------------- The DB use by the DSPAM software is very similar to your use case. The fastest queries are made using the PostgreSQL generate_series functionality to unwind the "IN *" to multiple single selects. Here is the lookup function that they use: create function lookup_tokens(integer,bigint[]) returns setof dspam_token_data language plpgsql stable as ' declare v_rec record; begin for v_rec in select * from dspam_token_data where uid=$1 and token in (select $2[i] from generate_series(array_lower($2,1), array_upper($2,1)) s(i)) loop return next v_rec; end loop; return; end;'; You should be able to try something similar for your workload. Ken Marshall