Query RE: Optimising UUID Lookups

Roland Dunn <roland.dunn@xxxxxxxxx> · Fri, 20 Mar 2015 19:01:20 +0000

Hi,
Wonder if anyone can help.

Have a lookup table where the primary key is a native uuid type
(filled with uuid's of type 4), around 50m rows in size.

Have a separate table, table A, similar size (around 50m rows).
Primary key in table A is the standard integer, nextval, etc type
primary key. Table A also has a uuid column. The uuid column in table
A (native Postgres uuid type) has a "UNIQUE CONSTRAINT, btree (uuid)"
constraint on the uuid column.

Currently regularly running following set of queries:
1. Pull around 10,000 rows from lookup table.
2. Use uuid's from (1), to query table A.

Query (2) above, is running slowly. Typically around 40-50 seconds to
pull 8000-10,000 rows. - which is pretty slow. The table has various
other columns: 4 text fields, couple of JSON fields, so each row in
table A is fairly "fat" (if that's the correct expression).

I've experimented with various forms of WHERE clause:
- (a) ANY ('{1dc384ea-ac3d-4e95-a33e-42f3d821c104,
- (b) ANY + VALUES: WHERE uuid =
ANY(VALUES('38de2ff6-ceed-43f3-a6fa-7a731ffa8c20':uuid),
('b956fa3a-87d0-42da-9a75-c498c7ca4650'));
- (c) Mulitple OR clauses

And I've experimented with both btree and hash indices on uuid on
table A. So various combinations: just btree, btree+hash, just hash.
By far the fastest (which in itself as I've outlined above is not very
fast) is btree and the ANY form I've listed as (a) above.

If I use btree + (a) above, EXPLAIN ANALYZE contains (below is for
4000 rows on a much smaller database, one of only 1million rows as
opposed to 65 million):

"
Index Scan using table_a_uuid_key on table_a (cost=5.42..32801.60
rows=4000 width=585) (actual time=0.035..23.023 rows=4000 loops=1)
Index Cond: (uuid = ANY
('{13aad9d6-bb45-4d98-a58b-b50147b6340d,40613404-ebf4-4343-8857-9 ...
etc ....
"

Various comments I've read:
- Perhaps actually try a JOIN, e.g. LEFT OUTER JOIN between lookup
table and table A.
- Perhaps increase work_mem (currently at 100mb)
- Perhaps, there's not alot that can be done. By using uuid type 4,
i.e. a fully random identifier, we're not looking at great performance
due to the fact that the id's are so ... random and not sequential.

We don't care about ordering, hence the experimentation with hash index.

Incidentally, when experimenting with just hash index and ANY, would
get following in EXPLAIN ANALYZE:

"
Bitmap Heap Scan on table_a_  (cost=160.36..320.52 rows=40 width=585)
(actual time=0.285..0.419 rows=40 loops=1)
   Recheck Cond: (uuid = ANY
('{a4a47eab-6393-4613-b098-b287ea59f2a4,3f0c6111-4b1b-4dae-bd36-e3c8d2b4341b,3748ea41-cf83-4024-a66c-be6b88352b7
   ->  Bitmap Index Scan on table_a__uuid_hash_index
(cost=0.00..160.35 rows=40 width=0) (actual time=0.273..0.273 rows=40
loops=1)
         Index Cond: (uuid = ANY
('{a4a47eab-6393-4613-b098-b287ea59f2a4,3f0c6111-4b1b-4dae-bd36-e3c8d2b4341b,3748ea41-cf83-4024-a66c-be6b88352b75,b1894bd6-ff
"

Anyway. Any suggestions, thoughts very welcome.

Thanks,
R

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance