Re: Query RE: Optimising UUID Lookups

Roland Dunn <roland.dunn@xxxxxxxxx> · Tue, 24 Mar 2015 10:45:26 +0000

Thanks for replies. More detail and data below:

Table: "lookup"

uuid: type uuid. not null. plain storage.
datetime_stamp: type bigint. not null. plain storage.
harvest_date_stamp: type bigint. not null. plain storage.
state: type smallint. not null. plain storage.

Indexes:
    "lookup_pkey" PRIMARY KEY, btree (uuid)
    "lookup_32ff3898" btree (datetime_stamp)
    "lookup_6c8369bc" btree (harvest_date_stamp)
    "lookup_9ed39e2e" btree (state)
Has OIDs: no

Table: "article_data"

int: type integer. not null default
nextval('article_data_id_seq'::regclass). plain storage.
title: text.
text: text.
insertion_date: date
harvest_date: timestamp with time zone.
uuid: uuid.

Indexes:
    "article_data_pkey" PRIMARY KEY, btree (id)
    "article_data_uuid_key" UNIQUE CONSTRAINT, btree (uuid)
Has OIDs: no

Both lookup and article_data have around 65m rows. Two queries:

(1) SELECT uuid FROM lookup WHERE state = 200 LIMIT 4000;

OUTPUT FROM EXPLAIN (ANALYZE, BUFFERS):
------------------------------------------------
 Limit  (cost=0.00..4661.02 rows=4000 width=16) (actual
time=0.009..1.036 rows=4000 loops=1)
   Buffers: shared hit=42
   ->  Seq Scan on lookup  (cost=0.00..1482857.00 rows=1272559
width=16) (actual time=0.008..0.777 rows=4000 loops=1)
         Filter: (state = 200)
         Rows Removed by Filter: 410
         Buffers: shared hit=42
 Total runtime: 1.196 ms
(7 rows)

Question: Why does this do a sequence scan and not an index scan when
there is a btree on state?

(2) SELECT article_data.id, article_data.uuid, article_data.title,
article_data.text FROM article_data WHERE uuid = ANY
('{f0d5e665-4f21-4337-a54b-cf0b4757db65,..... 3999 more uuid's
....}'::uuid[]);

OUTPUT FROM EXPLAIN (ANALYZE, BUFFERS):
------------------------------------------------
 Index Scan using article_data_uuid_key on article_data
(cost=5.56..34277.00 rows=4000 width=581) (actual
time=0.063..66029.031 rows=400
0 loops=1)
   Index Cond: (uuid = ANY
('{f0d5e665-4f21-4337-a54b-cf0b4757db65,5618754f-544b-4700-9d24-c364fd0ba4e9,958e37e3-6e6e-4b2a-b854-48e88ac1fdb7,ba56b483-59b2-4ae5-ae44-910401f3221b,aa4
aca60-a320-4ed3-b7b4-829e6ca63592,05f1c0b9-1f9b-4e1c-8f41-07545d694e6b,7aa4dee9-be17-49df-b0ca-d6e63b0dc023,e9037826-86c4-4bbc-a9d5-6977ff7458af,db5852bf-a447-4a1d-9673-ead2f7045589
,6704d89

0b2-9ea9-390c8ed3cb2e,91cedfca-6b55-43e6-ae33-f2adf758ec78,e1b41c2f-31bb-4d29-9757-e7467ebb66c7,a9d3e6a9-5324-44e7-9cab-489bfb5ca081,ce9c2e64-b40e-48d7-b346-b9c76d79f192,26c3fcc5-cccb-4bc9-a5f5-806ead6fc859,2da9a3bc-0acb-41fd-b565-2a8a8662b85c,2097d61b-8d9b-4795-bd0d-c6db5a8e0501,d8841e46-0c1e-499b-804f-cb3fec3593b0,3ea98067-79ee-4497-b986-20cc09da6294,63046459-225f-4672-9db4-25b4491566e6,d45b2540-5835-43db-8e48-aa7b6613f8d4,df8720bf-9a2a-4550-9183-fd5e36e40485,c1c2cf05-c1d4-4f4c-8d8c-8b515d4ef24a,7233cc38-96ca-4e79-89ea-14c51e0e7ef4,76c6901d-496f-4c73-9d45-c934e46401f8,51673157-e2c6-4b89-bbcd-9aeda1750301,3de3f10f-da3d-4a96-90cd-fa3c9a02df01,9dbec983-23b8-4847-9c0e-030a8aee7ccc,7108ec74-91dc-47c6-a762-d860f0d56caa,eda38d3c-1231-47b8-ad19-28549fb4ec4c,401673a7-e5ca-4a47-9dea-5870dc69dbc8,649244dd-9a5b-48a7-88cf-ca2c7915de27,e9c8f789-3602-4e91-850e-eabc67269ecb,a55be381-bb34-4f2c-aede-8bab37cb479c,d101b8f1-389c-4613-b310-cd7d114dea8d,abce5c60-fa16-4d88-b844-ee3287aab777,e64e8b97-632d-45b8-9f4e-d83ef1717e77,f3a62745-6bcb-400b-b770-ac3c2fc91b81}'::uuid[]))
   Buffers: shared hit=16060 read=4084 dirtied=292
 Total runtime: 66041.443 ms
(4 rows)

Question: Why is this so slow, even though it's reading from disk?

On 24 March 2015 at 07:49, David Rowley <dgrowleyml@xxxxxxxxx> wrote:
> On 21 March 2015 at 23:34, Roland Dunn <roland.dunn@xxxxxxxxx> wrote:
>>
>>
>> If we did add more RAM, would it be the effective_cache_size setting
>> that we would alter? Is there a way to force PG to load a particular
>> table into RAM? If so, is it actually a good idea?
>
>
> Have you had a look at EXPLAIN (ANALYZE, BUFFERS) for the query?
>
> Pay special attention to "Buffers: shared read=NNN" and "Buffers: shared
> hit=NNN", if you're not reading any buffers between runs then the pages are
> in the PostgreSQL shared buffers. By the looks of your config you have 10GB
> of these. On the other hand if you're getting buffer reads, then they're
> either coming from disk, or from the OS cache. PostgreSQL won't really know
> the difference.
>
> If you're not getting any buffer reads and it's still slow, then the problem
> is not I/O
>
> Just for fun... What happens if you stick the 50 UUIDs in some table,
> analyze it, then perform a join between the 2 tables, using IN() or
> EXISTS()... Is that any faster?
>
> Also how well does it perform with: set enable_bitmapscan = off; ?
>
> Regards
>
> David Rowley
>
>
>

-- 

Kind regards,
Roland

Roland Dunn
--------------------------

m: +44 (0)7967 646 789
e: roland.dunn@xxxxxxxxx
w: http://www.cloudshapes.co.uk/
https://twitter.com/roland_dunn
http://uk.linkedin.com/in/rolanddunn

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance