On 7/18/07, Pat Maddox <pergesu@xxxxxxxxx> wrote:
On 7/18/07, Michael Glaesemann <grzm@xxxxxxxxxxxxxxx> wrote: > > On Jul 18, 2007, at 16:12 , Pat Maddox wrote: > > > ERROR: invalid reference to FROM-clause entry for table "video_views" > > LINE 20: JOIN assets ON (video_views.video_id=videos.id) > > ^ > > HINT: There is an entry for table "video_views", but it cannot be > > referenced from this part of the query. > > It's because I mismatched the JOIN clauses during my copy-and-paste :( > > > On 7/18/07, Michael Glaesemann <grzm@xxxxxxxxxxxxxxx> wrote: > > >> FROM video_views > >> JOIN assets ON (video_views.video_id=videos.id) > >> JOIN videos ON (video_views.asset_id=assets.id) > > This should be > > FROM video_views > JOIN assets ON (video_views.asset_id=assets.id) > JOIN videos ON (video_views.video_id=videos.id) > > Do you have the EXPLAIN ANALYE output of the query? > > Michael Glaesemann > grzm seespotcode net > > > For some reason the functions you wrote are giving me trouble (there's a BIGINT involved, I tried changing the functions around but kept having issues). So here's the full query, hopefully formatted better: SELECT SUM(CASE WHEN (hit IS TRUE AND logged_in IS TRUE) THEN 1 ELSE 0 END) AS count_hits_console, SUM(CASE WHEN (hit IS TRUE AND logged_in IS FALSE) THEN 1 ELSE 0 END) AS count_hits_remote, SUM(CASE WHEN (played IS TRUE AND logged_in IS TRUE) THEN 1 ELSE 0 END) AS count_played_console, SUM(CASE WHEN (played IS TRUE AND logged_in IS FALSE) THEN 1 ELSE 0 END) AS count_played_remote, SUM(CASE WHEN (downloaded IS TRUE AND logged_in IS TRUE) THEN 1 ELSE 0 END) AS count_downloaded_console, SUM(CASE WHEN (downloaded IS TRUE AND logged_in IS FALSE) THEN 1 ELSE 0 END) AS count_downloaded_remote, SUM(CASE WHEN ((played IS TRUE OR downloaded IS TRUE) AND logged_in IS TRUE) THEN assets.size ELSE 0 END) as download_size_console, SUM(CASE WHEN ((played IS TRUE OR downloaded IS TRUE) AND logged_in IS FALSE) THEN assets.size ELSE 0 END) AS download_size_remote, videos.id, videos.title, videos.guid FROM video_views JOIN assets ON (video_views.asset_id=assets.id) JOIN videos on (video_views.video_id=videos.id) WHERE videos.company_id=1 GROUP BY videos.id, videos.title, videos.guid ORDER BY count_hits_remote DESC LIMIT 100 and here's the EXPLAIN ANALYZE output: Limit (cost=127072.90..127073.12 rows=87 width=64) (actual time=2636.560..2636.567 rows=20 loops=1) -> Sort (cost=127072.90..127073.12 rows=87 width=64) (actual time=2636.558..2636.562 rows=20 loops=1) Sort Key: sum(CASE WHEN ((video_views.hit IS TRUE) AND (video_views.logged_in IS FALSE)) THEN 1 ELSE 0 END) -> HashAggregate (cost=127067.49..127070.10 rows=87 width=64) (actual time=2636.481..2636.506 rows=20 loops=1) -> Hash Join (cost=880.96..125995.46 rows=38983 width=64) (actual time=24.904..2635.719 rows=122 loops=1) Hash Cond: (video_views.asset_id = assets.id) -> Hash Join (cost=195.96..124433.01 rows=39009 width=60) (actual time=8.327..2618.982 rows=122 loops=1) Hash Cond: (video_views.video_id = videos.id) -> Seq Scan on video_views (cost=0.00..101352.70 rows=5998470 width=12) (actual time=0.031..1410.231 rows=5998341 loops=1) -> Hash (cost=194.87..194.87 rows=87 width=52) (actual time=1.001..1.001 rows=90 loops=1) -> Bitmap Heap Scan on videos (cost=4.93..194.87 rows=87 width=52) (actual time=0.111..0.840 rows=90 loops=1) Recheck Cond: (company_id = 1) -> Bitmap Index Scan on index_videos_on_company_id (cost=0.00..4.90 rows=87 width=0) (actual time=0.079..0.079 rows=90 loops=1) Index Cond: (company_id = 1) -> Hash (cost=487.78..487.78 rows=15778 width=12) (actual time=16.527..16.527 rows=15778 loops=1) -> Seq Scan on assets (cost=0.00..487.78 rows=15778 width=12) (actual time=0.023..9.601 rows=15778 loops=1) Total runtime: 2637.043 ms (17 rows) That one runs reasonably fine, because there are only 20 videos being returned and a handful of video views associated with them. In the real query there are about 1k videos and a couple million views. That took about 80 minutes to run, according to logs. Pat
Okay so it doesn't normally take 80 minutes to run. Something funky just happened and it took that long once. It usually takes between 90-100 seconds. My coworker told me it takes 80 minutes but it appears that's an anomaly. One thing we were wondering is if all the aggregate calculations might be slowing it down, and if it might be faster to do six separate queries. The real problem there is sorting and merging the data sets. Merging isn't tough, but making sure that all queries obey the desired sort order is. What do you think? Pat