Re: Query RE: Optimising UUID Lookups

Roland Dunn <roland.dunn@xxxxxxxxx> · Sat, 21 Mar 2015 10:34:23 +0000

Hi Maxim,
Thanks for the reply, v interesting.

Do you speculate that the 10,000 rows would be randomly spread because
of the uuid-type that we chose, namely the uuid-4 type? i.e. the
completely random one? If we'd chosen the uuid-1 type (mac
address+timestamp), rows would have been more regularly placed and so
faster to pull back? Just curious as to why you said the randomly
spaced. Also bear in mind that we did experiment with both btree and
hash index on the uuid column.

RE: increasing the memory. Currently at 64GB, with following conf settings:

max_connections = 100
shared_buffers = 10GB
effective_cache_size = 45GB
work_mem = 100MB
maintenance_work_mem = 1GB
checkpoint_segments = 128
checkpoint_completion_target = 0.9
wal_buffers = 16MB

Is it worth (do you think) experimenting with work_mem, and if so to
what degree?
If we did add more RAM, would it be the effective_cache_size setting
that we would alter? Is there a way to force PG to load a particular
table into RAM? If so, is it actually a good idea?

Thanks again,
R

On 21 March 2015 at 09:10, Maxim Boguk <maxim.boguk@xxxxxxxxx> wrote:
>
> On Sat, Mar 21, 2015 at 6:01 AM, Roland Dunn <roland.dunn@xxxxxxxxx> wrote:
>>
>> Hi,
>> Wonder if anyone can help.
>>
>> Have a lookup table where the primary key is a native uuid type
>> (filled with uuid's of type 4), around 50m rows in size.
>>
>> Have a separate table, table A, similar size (around 50m rows).
>> Primary key in table A is the standard integer, nextval, etc type
>> primary key. Table A also has a uuid column. The uuid column in table
>> A (native Postgres uuid type) has a "UNIQUE CONSTRAINT, btree (uuid)"
>> constraint on the uuid column.
>>
>> Currently regularly running following set of queries:
>> 1. Pull around 10,000 rows from lookup table.
>> 2. Use uuid's from (1), to query table A.
>>
>> Query (2) above, is running slowly. Typically around 40-50 seconds to
>> pull 8000-10,000 rows. - which is pretty slow. The table has various
>> other columns: 4 text fields, couple of JSON fields, so each row in
>> table A is fairly "fat" (if that's the correct expression).
>
>
> Hi Roland,
>
> It's very likely that the query is IO-bound.
> Usual single SATA drive can perform around 100 IOPS/s.
> As a result to fetch randomly spread 10000 rows HDD must spent ~100second
> which is pretty close to actual timings.
>
> I suggest enable track_io_timing in postgresql.conf, and after use explain
> (analyze, costs, buffers, timing) instead of simple explain analyze. It will
> help you see time spend on the IO operations.
>
> If your load are actually IO-bound I could suggest 3 possible ways make
> things better:
> 1)use good server grade ssd drive instead of hdd.
> 2)increase memory on the server so database could comfortable fit into the
> RAM.
> 3)use raid10 raid with good raid controller and 6-12 SAS drives.
>
> The database could not retrieve rows faster than underlying file system
> could fetch data from hdd.
>
>
>
> --
> Maxim Boguk
> Senior Postgresql DBA
> http://www.postgresql-consulting.ru/
>

-- 

Kind regards,
Roland

Roland Dunn
--------------------------

m: +44 (0)7967 646 789
e: roland.dunn@xxxxxxxxx
w: http://www.cloudshapes.co.uk/
https://twitter.com/roland_dunn
http://uk.linkedin.com/in/rolanddunn

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance