Re: vacuum analyze slows sql query

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



patrick ~ wrote:
[...]
pkk=# explain analyze execute pkk_01(    241 );
      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=2.66..2.67 rows=1 width=0) (actual time=2872.211..2872.213
rows=1 loops=1)
   InitPlan
     ->  Limit  (cost=0.00..2.66 rows=1 width=4) (actual
time=2872.189..2872.189 rows=0 loops=1)
           ->  Seq Scan on pkk_purchase p0  (cost=0.00..37225.83 rows=13983
width=4) (actual time=2872.180..2872.180 rows=0 loops=1)
                 Filter: ((offer_id = $1) AND (((expire_time)::timestamp with
time zone > now()) OR (expire_time IS NULL) OR (pending = true)) AND
((cancel_date IS NULL) OR (pending = true)))
 Total runtime: 2872.339 ms
(6 rows)

Time: 2873.479 ms


[...]

So, is this the ultimate solution to this issue?

--patrick

It's not so much that correlation is < 0.5. It sounds like you're running into the same issue that I ran into in the past. You have a column with lots of repeated values, and a few exceptional ones. Notice this part of the query:
->  Seq Scan on pkk_purchase p0  (cost rows=13983) (actual rows=0)

For a general number, it thinks it might return 14,000 rows, hence the sequential scan. Before you do ANALYZE, it uses whatever defaults exist, which are probably closer to reality.

The problem is that you probably have some values for pkk_purchase where it could return 14,000 rows (possibly much much more). And for those, seq scan is the best plan. However, for the particular value that you are testing, there are very few (no) entries in the table.

With a prepared statement (or a function) it has to determine ahead of time what the best query is without knowing what value you are going to ask for.

Lets say for a second that you manage to trick it into using index scan, and then you actually call the function with one of the values that returns 1,000s of rows. Probably it will take 10-100 times longer than if it used a seq scan.

So what is the solution? The only one I'm aware of is to turn your static function into a dynamic one.

So somewhere within the function you build up a SQL query string and call EXECUTE str. This forces the query planner to be run every time you call the function. This means that if you call it will a "nice" value, you will get the fast index scan, and if you call it with a "bad" value, it will switch back to seq scan.

The downside is you don't get much of a benefit from using as stored procedure, as it has to run the query planner all the time (as though you issue the query manually each time.) But it still might be better for you in the long run.

Example:

instead of

create function test(int) returns int as '
declare
  x alias for $1;
  int y;
begin
  select into y ... from ... where id=x limit ...;
  return y;
end
';

use this format

create function test(int) returns int as '
declare
  x alias for $1;
  int y;
begin
  EXECUTE ''select into y ... from ... where id=''
	||quote_literal(x)
	|| '' limit ...'';
  return y;
end;
';

I think that will point you in the right direction.

John
=:->

Attachment: signature.asc
Description: OpenPGP digital signature


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux