Search Postgresql Archives

Re: Need help optimizing this query

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/18/07, Pat Maddox <pergesu@xxxxxxxxx> wrote:
On 7/18/07, Michael Glaesemann <grzm@xxxxxxxxxxxxxxx> wrote:
>
> On Jul 18, 2007, at 16:12 , Pat Maddox wrote:
>
> > ERROR:  invalid reference to FROM-clause entry for table "video_views"
> > LINE 20: JOIN assets ON (video_views.video_id=videos.id)
> >                         ^
> > HINT:  There is an entry for table "video_views", but it cannot be
> > referenced from this part of the query.
>
> It's because I mismatched the JOIN clauses during my copy-and-paste :(
>
> > On 7/18/07, Michael Glaesemann <grzm@xxxxxxxxxxxxxxx> wrote:
>
> >> FROM video_views
> >> JOIN assets ON (video_views.video_id=videos.id)
> >> JOIN videos ON (video_views.asset_id=assets.id)
>
> This should be
>
> FROM video_views
> JOIN assets ON (video_views.asset_id=assets.id)
> JOIN videos ON (video_views.video_id=videos.id)
>
> Do you have the EXPLAIN ANALYE output of the query?
>
> Michael Glaesemann
> grzm seespotcode net
>
>
>

For some reason the functions you wrote are giving me trouble (there's
a BIGINT involved, I tried changing the functions around but kept
having issues).  So here's the full query, hopefully formatted better:

SELECT
    SUM(CASE WHEN (hit IS TRUE AND logged_in IS TRUE) THEN 1 ELSE 0
END) AS count_hits_console,
    SUM(CASE WHEN (hit IS TRUE AND logged_in IS FALSE) THEN 1 ELSE 0
END) AS count_hits_remote,
    SUM(CASE WHEN (played IS TRUE AND logged_in IS TRUE) THEN 1 ELSE 0
END) AS count_played_console,
    SUM(CASE WHEN (played IS TRUE AND logged_in IS FALSE) THEN 1 ELSE
0 END) AS count_played_remote,
    SUM(CASE WHEN (downloaded IS TRUE AND logged_in IS TRUE) THEN 1
ELSE 0 END) AS count_downloaded_console,
    SUM(CASE WHEN (downloaded IS TRUE AND logged_in IS FALSE) THEN 1
ELSE 0 END) AS count_downloaded_remote,
    SUM(CASE WHEN ((played IS TRUE OR downloaded IS TRUE) AND
logged_in IS TRUE) THEN assets.size ELSE 0 END) as
download_size_console,
    SUM(CASE WHEN ((played IS TRUE OR downloaded IS TRUE) AND
logged_in IS FALSE) THEN assets.size ELSE 0 END) AS
download_size_remote,
    videos.id,
    videos.title,
    videos.guid
FROM video_views
JOIN assets ON (video_views.asset_id=assets.id)
JOIN videos on (video_views.video_id=videos.id)
WHERE videos.company_id=1
GROUP BY  videos.id,
          videos.title,
          videos.guid
ORDER BY count_hits_remote DESC
LIMIT 100



and here's the EXPLAIN ANALYZE output:

 Limit  (cost=127072.90..127073.12 rows=87 width=64) (actual
time=2636.560..2636.567 rows=20 loops=1)
   ->  Sort  (cost=127072.90..127073.12 rows=87 width=64) (actual
time=2636.558..2636.562 rows=20 loops=1)
         Sort Key: sum(CASE WHEN ((video_views.hit IS TRUE) AND
(video_views.logged_in IS FALSE)) THEN 1 ELSE 0 END)
         ->  HashAggregate  (cost=127067.49..127070.10 rows=87
width=64) (actual time=2636.481..2636.506 rows=20 loops=1)
               ->  Hash Join  (cost=880.96..125995.46 rows=38983
width=64) (actual time=24.904..2635.719 rows=122 loops=1)
                     Hash Cond: (video_views.asset_id = assets.id)
                     ->  Hash Join  (cost=195.96..124433.01 rows=39009
width=60) (actual time=8.327..2618.982 rows=122 loops=1)
                           Hash Cond: (video_views.video_id = videos.id)
                           ->  Seq Scan on video_views
(cost=0.00..101352.70 rows=5998470 width=12) (actual
time=0.031..1410.231 rows=5998341 loops=1)
                           ->  Hash  (cost=194.87..194.87 rows=87
width=52) (actual time=1.001..1.001 rows=90 loops=1)
                                 ->  Bitmap Heap Scan on videos
(cost=4.93..194.87 rows=87 width=52) (actual time=0.111..0.840 rows=90
loops=1)
                                       Recheck Cond: (company_id = 1)
                                       ->  Bitmap Index Scan on
index_videos_on_company_id  (cost=0.00..4.90 rows=87 width=0) (actual
time=0.079..0.079 rows=90 loops=1)
                                             Index Cond: (company_id = 1)
                     ->  Hash  (cost=487.78..487.78 rows=15778
width=12) (actual time=16.527..16.527 rows=15778 loops=1)
                           ->  Seq Scan on assets  (cost=0.00..487.78
rows=15778 width=12) (actual time=0.023..9.601 rows=15778 loops=1)
 Total runtime: 2637.043 ms
(17 rows)


That one runs reasonably fine, because there are only 20 videos being
returned and a handful of video views associated with them.  In the
real query there are about 1k videos and a couple million views.  That
took about 80 minutes to run, according to logs.

Pat


Okay so it doesn't normally take 80 minutes to run.  Something funky
just happened and it took that long once.  It usually takes between
90-100 seconds.  My coworker told me it takes 80 minutes but it
appears that's an anomaly.

One thing we were wondering is if all the aggregate calculations might
be slowing it down, and if it might be faster to do six separate
queries.  The real problem there is sorting and merging the data sets.
Merging isn't tough, but making sure that all queries obey the
desired sort order is.  What do you think?

Pat


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux