Re: join to view over custom aggregate seems like it should be faster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/9/07, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
On 4/9/07, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> "Merlin Moncure" <mmoncure@xxxxxxxxx> writes:
> > I have an odd performance issue on 8.2 that I'd thought I'd document
> > here.  I have a workaround, but I'm if there is something that I'm not
> > seeing.
>
> It's hard to comment on this without seeing the full details of the view
> and tables.  I'm wondering where the SubPlans are coming from, for instance.

ah, it looks like the aggregate is being re-expanded for each field
returned by the aggregate.  I notice this for non-trivial record
returning functions also.  standard m.o. is to push into a subquery
and expand afterwords.

[sorry for the deluge of info]

I cleaned up the view from:
create or replace view latest_download as
select software_binary_id, host_id,
  ((
    select latest_software_download(
          (bds_status_id,
          mtime,
          dl_window_open,
          dl_window_close,
          download_start,
          download_stop,
          info,
          userupgradeable,
          overrideflag,
          percent_complete)::software_download_data)
  )::software_download_data).*
  from software_download group by host_id, software_binary_id;

to this:
create or replace view latest_download as
 select software_binary_id, host_id, (v).* from
   (
     select
       software_binary_id, host_id,
       latest_software_download(
       (bds_status_id,
       mtime,
       dl_window_open,
       dl_window_close,
       download_start,
       download_stop,
       info,
       userupgradeable,
       overrideflag,
       percent_complete)::software_download_data) as v
       from software_download group by host_id, software_binary_id
   ) q;

this cleaned up the odd subplans but is still slow:
dev20400=# explain analyze select * from foo join latest_download
using (host_id, software_binary_id);

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Join  (cost=1308.84..1467.81 rows=25 width=40) (actual
time=1472.668..1914.799 rows=494 loops=1)
  Hash Cond: ((q.host_id = foo.host_id) AND (q.software_binary_id =
foo.software_binary_id))
  ->  HashAggregate  (cost=1293.48..1350.17 rows=4535 width=94)
(actual time=1467.002..1700.388 rows=37247 loops=1)
        ->  Seq Scan on software_download  (cost=0.00..953.42
rows=45342 width=94) (actual time=0.014..274.747 rows=45342 loops=1)
  ->  Hash  (cost=7.94..7.94 rows=494 width=8) (actual
time=5.028..5.028 rows=494 loops=1)
        ->  Seq Scan on foo  (cost=0.00..7.94 rows=494 width=8)
(actual time=0.022..2.507 rows=494 loops=1)
Total runtime: 1918.721 ms

compare it to this:
dev20400=# explain analyze select * from foo f where exists (select *
from latest_download where host_id = f.host_id and software_binary_id
= f.software_binary_id);

   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on foo f  (cost=0.00..3122.01 rows=247 width=8) (actual
time=0.152..45.941 rows=494 loops=1)
  Filter: (subplan)
  SubPlan
    ->  Subquery Scan q  (cost=0.00..6.30 rows=1 width=40) (actual
time=0.081..0.081 rows=1 loops=494)
          ->  GroupAggregate  (cost=0.00..6.29 rows=1 width=94)
(actual time=0.065..0.065 rows=1 loops=494)
                ->  Index Scan using software_download_idx on
software_download  (cost=0.00..6.27 rows=1 width=94) (actual
time=0.013..0.021 r
                      Index Cond: ((host_id = $0) AND
(software_binary_id = $1))
Total runtime: 48.323 ms
(8 rows)

Time: 49.851 ms

I since I need both sides, I can't figure out a way to force the index
to be used during the join except to use a function to look up the
view based on the key, which works:
dev20400=# explain analyze select latest_download(host_id,
software_binary_id) from foo;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on foo  (cost=0.00..9.18 rows=494 width=8) (actual
time=0.566..51.605 rows=494 loops=1)
Total runtime: 54.290 ms
(2 rows)

dev20400=# explain analyze select * from latest_download where host_id
= 1 and software_binary_id = 12345;
                                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan q  (cost=0.00..6.30 rows=1 width=40) (actual
time=0.046..0.046 rows=0 loops=1)
  ->  GroupAggregate  (cost=0.00..6.29 rows=1 width=94) (actual
time=0.035..0.035 rows=0 loops=1)
        ->  Index Scan using software_download_idx on
software_download  (cost=0.00..6.27 rows=1 width=94) (actual
time=0.024..0.024 rows=0 lo
              Index Cond: ((host_id = 1) AND (software_binary_id = 12345))
Total runtime: 0.134 ms

For some reason, I can't get the index to be used on the table sitting
under a view during a join, even though it should be, or at least it
seems....

merlin


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux