ORDER BY ... LIMIT and JOIN

Fizu <Fizu@xxxxxxxxxxxxxxxxx> · Sat, 8 Aug 2009 03:02:47 -0300

Hello,

I'm trying to optimize the follow query which returns the top users
ordered by ranking. I'll show you my schema and "explain analyze" for
each case.

So, i'm asking two things:

1) Why "ranking" index is not used in the second query when sorting.
2) Am i missing some obvious optimization like a missing index? :)

Schemas:

# \d ranking
            Table "public.ranking"
  Column   |         Type          | Modifiers
-----------+-----------------------+-----------
 ranking   | bigint                |
 score     | double precision      |
 username  | character varying(20) | not null
 variation | bigint                |
Indexes:
    "ranking_tmp_pkey1" PRIMARY KEY, btree (username)
    "idxrank_6057" btree (ranking) CLUSTER

# \d user
                                  Table "public.user"
   Column   |         Type          |                     Modifiers
------------+-----------------------+---------------------------------------------------
 id         | integer               | not null default
nextval('user_id_seq'::regclass)
 username   | character varying(20) | not null
 about      | text                  |
 name       | character varying(50) |
 photo      | text                  |
 country_id | integer               |
Indexes:
    "user_pkey" PRIMARY KEY, btree (username)
    "country_ranking_user_idx" btree (country_id)

Explain:

#  explain analyze SELECT * FROM "ranking" INNER JOIN "user" ON
("ranking"."username" = "user"."username") WHERE "user"."country_id" =
1  ORDER BY "ranking"."ranking" ASC LIMIT 100;

 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=13.03..13.04 rows=1 width=180) (actual
time=965.229..965.302 rows=100 loops=1)
   ->  Sort  (cost=13.03..13.04 rows=1 width=180) (actual
time=965.227..965.256 rows=100 loops=1)
         Sort Key: ranking.ranking
         Sort Method:  top-N heapsort  Memory: 56kB
         ->  Nested Loop  (cost=0.00..13.02 rows=1 width=180) (actual
time=0.049..900.847 rows=57309 loops=1)
               ->  Index Scan using country_ranking_user_idx on "user"
 (cost=0.00..6.49 rows=1 width=145) (actual time=0.023..57.633
rows=57309 loops=1)
                     Index Cond: (country_id = 1)
               ->  Index Scan using ranking_tmp_pkey1 on ranking
(cost=0.00..6.52 rows=1 width=35) (actual time=0.013..0.013 rows=1
loops=57309)
                     Index Cond: ((ranking.username)::text =
("user".username)::text)
 Total runtime: 965.412 ms
(10 rows)

# explain analyze SELECT * FROM "ranking" INNER JOIN "user" ON
("ranking"."username" = "user"."username") ORDER BY
"ranking"."ranking" ASC LIMIT 100;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..137.02 rows=100 width=180) (actual
time=0.056..1.973 rows=100 loops=1)
   ->  Nested Loop  (cost=0.00..3081316.65 rows=2248753 width=180)
(actual time=0.055..1.921 rows=100 loops=1)
         ->  Index Scan using idxrank_6057 on ranking
(cost=0.00..70735.73 rows=2248753 width=35) (actual time=0.021..0.076
rows=100 loops=1)
         ->  Index Scan using user_pkey on "user"  (cost=0.00..1.33
rows=1 width=145) (actual time=0.016..0.017 rows=1 loops=100)
               Index Cond: (("user".username)::text = (ranking.username)::text)
 Total runtime: 2.043 ms
(6 rows)

Thanks!
Fz

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance