Re: Slow SELECT by primary key? Postgres 9.1.2

Matheus de Oliveira <matioli.matheus@xxxxxxxxx> · Tue, 28 May 2013 09:39:45 -0300

On Mon, May 27, 2013 at 11:02 AM, John Mudd <johnbmudd@xxxxxxxxx> wrote:

Postgres 9.1.2 on Ubuntu 12.04

Any reason why a select by primary key would be slower than a select that includes an ORDER BY? I was really hoping using the primary key would give me a boost. 

I stopped the server and cleared the O/S cache using "sync; echo 3 > /proc/sys/vm/drop_caches" between the runs.

test=# VACUUM ANALYZE test_select;
VACUUM

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;

                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41895.49 rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)

         Index Cond: (key1 >= 500000)
 Total runtime: 12.678 ms

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 500000;

                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------

 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1)
   Index Cond: (id = 500000)
 Total runtime: 31.504 ms

Schema:

test=# \d test_select
                            Table "public.test_select"
 Column |     Type     |                        Modifiers                         

--------+--------------+----------------------------------------------------------
 id     | integer      | not null default nextval('test_select_id_seq'::regclass)
 key1   | integer      | 

 key2   | integer      | 
 key3   | integer      | 
 data   | character(4) | 
Indexes:
    "test_select_pkey" PRIMARY KEY, btree (id)
    "my_key" btree (key1, key2, key3, id)

test=# 

Sample data:

test=# SELECT * FROM test_select LIMIT 10;
 id |  key1  |  key2  |  key3  | data 

----+--------+--------+--------+------
  1 | 984966 | 283954 | 772063 | x   
  2 | 817668 | 393533 | 924888 | x   
  3 | 751039 | 798753 | 454309 | x   
  4 | 128505 | 329643 | 280553 | x   

  5 | 105600 | 257225 | 710015 | x   
  6 | 323891 | 615614 |  83206 | x   
  7 | 194054 |  63506 | 353171 | x   
  8 | 212068 | 881225 | 271804 | x   
  9 | 644180 |  26693 | 200738 | x   

 10 | 136586 | 498699 | 554417 | x   
(10 rows)

For me looks like "my_key" index should be better than the PK in this case. For some reasons:

1. You are using a ORDER BY that has the same fields (and at the same order) from your index, so PG only needs to navigate the index.

2. You are using LIMIT 1, which means PG only needs to fetch the first element which key1>=50000 (and stop the search right after it).

In the case of your PK, PG will need to navigate through the index and return only one value also, but in this case the number of entries it needs to look at is bigger, because "id" has more distinct values than "key1".

Regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres