join to view over custom aggregate seems like it should be faster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have an odd performance issue on 8.2 that I'd thought I'd document
here.  I have a workaround, but I'm if there is something that I'm not
seeing.

ok, for starters:
I have a large table that is basically organized like this:
create table big
(
 key1 int,
 key2 int,
 ts timestamp
 [other fields]
);

and a view most_recent_big which lists for each combination of key1
and key2, the '[other fields]' that are behind the highest (most
recent) timestamp.  The original view implementation involved a self
join which is the classic sql approach to pulling values from a
denormalized table (the real solution of course is to normalize the
data but I can't do that for various reasons).  This wasn't very fast,
so I wrote a custom aggregate to optimize the view (there are usuallly
very small #s of records for key1, key2 pair:

create view latest_big_view as
 select key1, key2, max_other_fields[other fields]
 from big
 group by key1, key2;

This worked very well, but sometimes the index on key1, key2 does not
get utilized when joining against latest_big_view.  Let's say I have a
number of key1, key2 pairs in another table:

for example:
select * from foo, latest_big_view using (key1, key2);
breaks down.

here is a example of the 'breakdown' plan on real tables. selecting a
single record from the view is very fast...1ms or less.  The join
can't 'see through' the view to filter the index.

dev20400=# explain analyze select * from foo join latest_download
using (host_id, software_binary_id);

    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Hash Join  (cost=15.35..4616.65 rows=25 width=90) (actual
time=229.623..10601.317 rows=494 loops=1)
  Hash Cond: ((latest_download.host_id = foo.host_id) AND
(latest_download.software_binary_id = foo.software_binary_id))
  ->  GroupAggregate  (cost=0.00..4499.01 rows=4535 width=94) (actual
time=0.346..10370.383 rows=37247 loops=1)
        ->  Index Scan using software_download_idx on
software_download  (cost=0.00..2526.53 rows=45342 width=94) (actual
time=0.028..344.591
        SubPlan
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.006..0.011 rows=1 loops=37247)
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.006..0.011 rows=1 loops=37247)
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.005..0.011 rows=1 loops=37247)
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.006..0.011 rows=1 loops=37247)
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.006..0.011 rows=1 loops=37247)
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.006..0.011 rows=1 loops=37247)
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.005..0.011 rows=1 loops=37247)
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.005..0.011 rows=1 loops=37247)
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.006..0.011 rows=1 loops=37247)
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.005..0.010 rows=1 loops=37247)
  ->  Hash  (cost=7.94..7.94 rows=494 width=8) (actual
time=5.568..5.568 rows=494 loops=1)
        ->  Seq Scan on foo  (cost=0.00..7.94 rows=494 width=8)
(actual time=0.018..2.686 rows=494 loops=1)
Total runtime: 10604.260 ms
(18 rows)


Here is the same query but on the root table, instead of the view:
dev20400=# explain analyze select * from foo join software_download
using (host_id, software_binary_id);
                                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.00..1521.60 rows=19 width=94) (actual
time=0.084..24.992 rows=607 loops=1)
  ->  Seq Scan on foo  (cost=0.00..7.94 rows=494 width=8) (actual
time=0.044..2.753 rows=494 loops=1)
  ->  Index Scan using software_download_idx on software_download
(cost=0.00..3.05 rows=1 width=94) (actual time=0.011..0.019 rows=1
loops=49
        Index Cond: ((foo.host_id = software_download.host_id) AND
(foo.software_binary_id = software_download.software_binary_id))
Total runtime: 28.385 ms
(5 rows)

I can use a trick with a function to make the view give out reasonalbe results:

create function foo(int, int) returns latest_download as
$$ select * from latest_download where software_binary_id = $1 and
host_id = $2; $$ language sql;

dev20400=# explain analyze select (v).* from (select
foo(software_binary_id, host_id) as v from foo) q;
                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------
Subquery Scan q  (cost=0.00..14.12 rows=494 width=32) (actual
time=1.436..139.644 rows=494 loops=1)
  ->  Seq Scan on foo  (cost=0.00..9.18 rows=494 width=8) (actual
time=1.414..131.144 rows=494 loops=1)
Total runtime: 142.887 ms
(3 rows)

Time: 144.306 ms

merlin


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux