Re: performance question (something to do w/

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 08, 2006 at 12:50:13PM -0500, Jim C. Nasby wrote:
> On Mon, May 08, 2006 at 10:42:21AM -0700, Mark Lewis wrote:
> > Doing a SELECT with a large list of variables inside an IN runs slowly
> > on every database we've tested.  We've tested mostly in Oracle and
> > PostgreSQL, and both get very slow very quickly (actually Oracle refuses
> > to process the query at all after it gets too many bind parameters).
> > 
> > In our case, we have a (potentially quite large) set of external values
> > that we want to look up in the database.  We originally thought that
> > doing a single select with a large IN clause was the way to go, but then
> > we did some performance analysis on the optimal batch size (number of
> > items to include per IN clause), and discovered that for most databases,
> > the optimal batch size was 1.  For PostgreSQL I think it was 2.
> > 
> > The moral of the story is that you're probably better off running a
> > bunch of small selects than in trying to optimize things with one
> > gargantuan select.
> 
> Ever experiment with loading the parameters into a temp table and
> joining to that?
> -- 
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@xxxxxxxxxxxxx
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
> 
> ---------------------------(end of broadcast)---------------------------

The DB use by the DSPAM software is very similar to your use case. The
fastest queries are made using the PostgreSQL generate_series functionality
to unwind the "IN *" to multiple single selects. Here is the lookup function
that they use:

create function lookup_tokens(integer,bigint[])
  returns setof dspam_token_data
  language plpgsql stable
  as '
declare
  v_rec record;
begin
  for v_rec in select * from dspam_token_data
                where uid=$1
                  and token in (select $2[i]
                                  from generate_series(array_lower($2,1),
                                                       array_upper($2,1)) s(i))
  loop
    return next v_rec;
  end loop;
  return;
end;';


You should be able to try something similar for your workload.

Ken Marshall


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux