On Fri, Jan 29, 2010 at 10:49 PM, Glenn Maynard <glenn@xxxxxxxx> wrote: > Hitting a performance issues that I'm not sure how to diagnose. > > SELECT highscores_for_steps_and_card(s.id, 591, 1) FROM stomp_steps s; > Seq Scan on stomp_steps s (cost=0.00..793.52 rows=2902 width=4) > (actual time=26509.919..26509.919 rows=0 loops=1) > Total runtime: 26509.972 ms > > The inner function looks like this: > > CREATE FUNCTION highscores_for_steps_and_card(steps_id int, card_id > int, count int) RETURNS SETOF INTEGER LANGUAGE SQL AS $$ > SELECT r.id FROM stomp_round r > WHERE ($1 IS NULL OR r.steps_id = $1) AND ($2 IS NULL OR > r.user_card_id = $2) > ORDER BY r.score DESC LIMIT $3 > $$ > > Limit (cost=13.12..13.12 rows=1 width=8) (actual time=0.054..0.054 > rows=0 loops=1) > -> Sort (cost=13.12..13.12 rows=1 width=8) (actual > time=0.051..0.051 rows=0 loops=1) > Sort Key: score > Sort Method: quicksort Memory: 17kB > -> Bitmap Heap Scan on stomp_round r (cost=9.09..13.11 > rows=1 width=8) (actual time=0.036..0.036 rows=0 loops=1) > Recheck Cond: ((280 = steps_id) AND (user_card_id = 591)) > -> BitmapAnd (cost=9.09..9.09 rows=1 width=0) (actual > time=0.032..0.032 rows=0 loops=1) > -> Bitmap Index Scan on stomp_round_steps_id > (cost=0.00..4.40 rows=20 width=0) (actual time=0.030..0.030 rows=0 > loops=1) > Index Cond: (280 = steps_id) > -> Bitmap Index Scan on stomp_round_user_card_id > (cost=0.00..4.44 rows=25 width=0) (never executed) > Index Cond: (user_card_id = 591) > Total runtime: 0.153 ms > (12 rows) > > stomp_steps has about 1500 rows, so it finds 1500 high scores, one for > each stage. > > I expected scalability issues from this on a regular drive, since > it'll be doing a ton of index seeking when not working out of cache, > so I expected to need to change to an SSD at some point (when it no > longer easily fits in cache). However, I/O doesn't seem to be the > bottleneck yet. If I run it several times, it consistently takes 26 > seconds. The entire database is in OS cache (find | xargs cat: > 250ms). > > I'm not sure why the full query (26s) is orders of magnitude slower > than 1500*0.150ms (225ms). It's not a very complex query, and I'd > hope it's not being re-planned every iteration through the loop. Any > thoughts? Using SELECT to iterate over a table like this is very > useful (and I don't know any practical alternative), but it's > difficult to profile since it doesn't play nice with EXPLAIN ANALYZE. I believe that the time for the seq-scan node doesn't include the time to generate the outputs, which is where all the function calls are. As a general rule, I have found that function calls are reaaaaally slow, and that calling a function in a loop is almost always a bad idea. You didn't mention what PG version you're running, but I believe that with a sufficiently new version (8.4?) it'll actually inline SQL functions into the invoking query, which will probably be lots faster. If not, you can inline it manually. Rewriting it as a join will likely be faster still: SELECT r.id FROM stomp_steps s, stomp_round r WHERE (s.id IS NULL OR r.steps_id = s.id) AND ($1 IS NULL OR r.user_card_id = $1) ORDER BY r.score DESC LIMIT $2 You might even break it into two cases: SELECT r.id FROM stomp_steps s, stomp_round r WHERE r.steps_id = s.id AND ($1 IS NULL OR r.user_card_id = $1) UNION ALL SELECT r.id FROM stomp_steps s, stomp_round r WHERE s.id IS NULL AND ($1 IS NULL OR r.user_card_id = $1) ORDER BY r.score DESC LIMIT $2 Or if s.id can't really be NULL: SELECT r.id FROM stomp_steps s, stomp_round r WHERE r.steps_id = s.id AND ($1 IS NULL OR r.user_card_id = $1) ORDER BY r.score DESC LIMIT $2 These kinds of rewrites allow the query planner progressively more flexibility - to use a hash or merge join, for example, instead of a nested loop. And they eliminate overhead. You'll have to play around with it and see what works best in your particular environment, but in general, I find it pays big dividends to avoid wrapping these kinds of logic bits inside a function. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance