Re: Nested query performance issue

Heikki Linnakangas <heikki.linnakangas@xxxxxxxxxxxxxxxx> · Thu, 09 Apr 2009 14:29:11 +0300

Glenn Maynard wrote:
This rewrite allows getting the top N scores.  Unfortunately, this one
takes 950ms for the same data.  With 1000000 scores, it takes 14800ms.

SELECT s.* FROM score s, game g
WHERE s.game_id = g.id AND
 s.id IN (
    SELECT s2.id FROM score s2 WHERE s2.game_id=g.id ORDER BY s2.score
DESC LIMIT 1
 );

You don't really need the join with game here, simplifying this into:

SELECT s.* FROM score s
WHERE s.id IN (
    SELECT s2.id FROM score s2 WHERE s2.game_id=s.game_id ORDER BY 
s2.score
DESC LIMIT 1
);

I don't think it makes it any faster, though.

You can also do this in a very nice and clean fashion using the upcoming 
PG 8.4 window functions:

SELECT * FROM (
  SELECT s.*, rank() OVER (PARTITION BY s.game_id ORDER BY score DESC) 
AS rank FROM score s
) AS sub WHERE rank <= 5;

but I'm not sure how much faster it is. At least here on my laptop it 
does a full index scan on score, which may or may not be faster than 
just picking the top N values for each game using the index.

This seems simple: for each game, search for the highest score, and
then scan the tree to get the next N-1 highest scores.  The first
version does just that, but the second one is doing a seq scan over
score.

You can do that approach with a SQL function:

CREATE FUNCTION topnscores(game_id int , n int) RETURNS SETOF score 
LANGUAGE SQL AS $$
SELECT * FROM score s WHERE s.game_id = $1 ORDER BY score DESC LIMIT $2
$$;

SELECT (sub.ts).id, (sub.ts).score, (sub.ts).game_id
FROM (SELECT topnscores(g.id, 5) ts FROM game g) sub;

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance