Re: vacuum analyze slows sql query

John Meinel <john@xxxxxxxxxxxxxx> · Mon, 08 Nov 2004 13:29:39 -0600

patrick ~ wrote:
[...]
pkk=# explain analyze execute pkk_01(    241 );
      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=2.66..2.67 rows=1 width=0) (actual time=2872.211..2872.213
rows=1 loops=1)
   InitPlan
     ->  Limit  (cost=0.00..2.66 rows=1 width=4) (actual
time=2872.189..2872.189 rows=0 loops=1)
           ->  Seq Scan on pkk_purchase p0  (cost=0.00..37225.83 rows=13983
width=4) (actual time=2872.180..2872.180 rows=0 loops=1)
                 Filter: ((offer_id = $1) AND (((expire_time)::timestamp with
time zone > now()) OR (expire_time IS NULL) OR (pending = true)) AND
((cancel_date IS NULL) OR (pending = true)))
 Total runtime: 2872.339 ms
(6 rows)

Time: 2873.479 ms

[...]

So, is this the ultimate solution to this issue?

--patrick

It's not so much that correlation is < 0.5. It sounds like you're 
running into the same issue that I ran into in the past. You have a 
column with lots of repeated values, and a few exceptional ones. Notice 
this part of the query:
->  Seq Scan on pkk_purchase p0  (cost rows=13983) (actual rows=0)

For a general number, it thinks it might return 14,000 rows, hence the 
sequential scan. Before you do ANALYZE, it uses whatever defaults exist, 
which are probably closer to reality.

The problem is that you probably have some values for pkk_purchase where 
it could return 14,000 rows (possibly much much more). And for those, 
seq scan is the best plan. However, for the particular value that you 
are testing, there are very few (no) entries in the table.

With a prepared statement (or a function) it has to determine ahead of 
time what the best query is without knowing what value you are going to 
 ask for.

Lets say for a second that you manage to trick it into using index scan, 
and then you actually call the function with one of the values that 
returns 1,000s of rows. Probably it will take 10-100 times longer than 
if it used a seq scan.

So what is the solution? The only one I'm aware of is to turn your 
static function into a dynamic one.

So somewhere within the function you build up a SQL query string and 
call EXECUTE str. This forces the query planner to be run every time you 
call the function. This means that if you call it will a "nice" value, 
you will get the fast index scan, and if you call it with a "bad" value, 
it will switch back to seq scan.

The downside is you don't get much of a benefit from using as stored 
procedure, as it has to run the query planner all the time (as though 
you issue the query manually each time.) But it still might be better 
for you in the long run.

Example:

instead of

create function test(int) returns int as '
declare
  x alias for $1;
  int y;
begin
  select into y ... from ... where id=x limit ...;
  return y;
end
';

use this format

create function test(int) returns int as '
declare
  x alias for $1;
  int y;
begin
  EXECUTE ''select into y ... from ... where id=''
	||quote_literal(x)
	|| '' limit ...'';
  return y;
end;
';

I think that will point you in the right direction.

John
=:->
Attachment:
signature.asc

Description: OpenPGP digital signature