Re: Slow SELECT by primary key? Postgres 9.1.2

Evgeny Shishkin <itparanoia@xxxxxxxxx> · Mon, 27 May 2013 18:59:36 +0400

On May 27, 2013, at 6:35 PM, John Mudd <johnbmudd@xxxxxxxxx> wrote:

Thanks, that's easy enough to test. Didn't seem to help though.

Ok. And if you CLUSTER tables USING PK?

test=# REINDEX index test_select_pkey;
REINDEX

test=# VACUUM ANALYZE test_select ;
VACUUM

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;
                                                              QUERY PLAN                                                              

--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 rows=1 loops=1)

   ->  Index Scan using my_key on test_select  (cost=0.00..41981.16 rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
         Index Cond: (key1 >= 500000)
 Total runtime: 16.444 ms

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 500000;

                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------

 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1)

   Index Cond: (id = 500000)
 Total runtime: 23.192 ms

On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin <itparanoia@xxxxxxxxx> wrote:

On May 27, 2013, at 6:02 PM, John Mudd <johnbmudd@xxxxxxxxx> wrote:

> Postgres 9.1.2 on Ubuntu 12.04

>

> Any reason why a select by primary key would be slower than a select that includes an ORDER BY? I was really hoping using the primary key would give me a boost.

>

You created my_key after data loading, and PK was there all the time.

If you REINDEX PK, i bet it will be as fast.

> I stopped the server and cleared the O/S cache using "sync; echo 3 > /proc/sys/vm/drop_caches" between the runs.

>

>

>

> test=# VACUUM ANALYZE test_select;

> VACUUM

>

> (stopped postgres; reset O/S cache; started postgres)

>

> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000 ORDER BY key1, key2, key3, id LIMIT 1;

>                                                               QUERY PLAN

> --------------------------------------------------------------------------------------------------------------------------------------

>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1 loops=1)

>    ->  Index Scan using my_key on test_select  (cost=0.00..41895.49 rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)

>          Index Cond: (key1 >= 500000)

>  Total runtime: 12.678 ms

>

> (stopped postgres; reset O/S cache; started postgres)

>

> test=# explain analyze SELECT * FROM test_select WHERE id = 500000;

>                                                            QUERY PLAN

> ---------------------------------------------------------------------------------------------------------------------------------

>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1)

>    Index Cond: (id = 500000)

>  Total runtime: 31.504 ms

>

>

>

> Schema:

>

> test=# \d test_select

>                             Table "public.test_select"

>  Column |     Type     |                        Modifiers

> --------+--------------+----------------------------------------------------------

>  id     | integer      | not null default nextval('test_select_id_seq'::regclass)

>  key1   | integer      |

>  key2   | integer      |

>  key3   | integer      |

>  data   | character(4) |

> Indexes:

>     "test_select_pkey" PRIMARY KEY, btree (id)

>     "my_key" btree (key1, key2, key3, id)

>

> test=#

>

>

>

> Sample data:

>

> test=# SELECT * FROM test_select LIMIT 10;

>  id |  key1  |  key2  |  key3  | data

> ----+--------+--------+--------+------

>   1 | 984966 | 283954 | 772063 | x

>   2 | 817668 | 393533 | 924888 | x

>   3 | 751039 | 798753 | 454309 | x

>   4 | 128505 | 329643 | 280553 | x

>   5 | 105600 | 257225 | 710015 | x

>   6 | 323891 | 615614 |  83206 | x

>   7 | 194054 |  63506 | 353171 | x

>   8 | 212068 | 881225 | 271804 | x

>   9 | 644180 |  26693 | 200738 | x

>  10 | 136586 | 498699 | 554417 | x

> (10 rows)

>

>

>

>

> Here's how I populated the table:

>

> import psycopg2

>

> conn = psycopg2.connect('dbname=test')

>

> cur = conn.cursor()

>

> def random_int():

>     n = 1000000

>     return random.randint(0,n)

>

> def random_key():

>     return random_int(), random_int(), random_int()

>

> def create_table():

>     cur.execute('''

>             DROP TABLE IF EXISTS test_select;

>

>             CREATE TABLE test_select (

>                 id                      SERIAL PRIMARY KEY,

>                 key1                    INTEGER,

>                 key2                    INTEGER,

>                 key3                    INTEGER,

>                 data                    char(4)

>             );

>         ''')

>     conn.commit()

>

>     n = 1000000

>     for i in range(n):

>         cur.execute("INSERT INTO test_select(key1, key2, key3, data) VALUES(%s, %s, %s, 'x')", random_key())

>     conn.commit()

>

>     cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, id)')

>     conn.commit()

>

> create_table()

>