Re: UUIDs & Clustered Indexes

Francisco Olarte <folarte@xxxxxxxxxxxxxx> · Tue, 30 Aug 2016 16:58:45 +0200

Luke:

On Tue, Aug 30, 2016 at 3:59 PM, Luke Gordon <gordysc@xxxxxxxxx> wrote:
> I'm trying to decide on which UUID generator to use for my Postgres
> database, and I've narrowed it down to gen_random & uuid_generate_v1mc.
>
> There's a fascinating article that discusses performance implications
> between gen_random_uuid & uuid_generate_v1mc:
> https://www.starkandwayne.com/blog/uuid-primary-keys-in-postgresql/
> TL;DR, the article suggests:
> "Random produces very fragmented inserts that destroy tables. Use
> uuid_generate_v1mc() [instead].."

He probably means destroy index, not tables, tables are not index
ordered in postgres ( like some kind of clustered tables in other
products )

>
> However, according to a message on this mailing list, Postgres doesn't have
> clustered indexes:
> "But Postgres doesn't _have_ clustered indexes, so that article doesn't
> apply at all. The other authors appear to have missed this important point."
> https://www.postgresql.org/message-id/56798352.7060902%40uchicago.edu

No, he hasn't. If you are generating them to use as a kind of primary
key, like a serial, having a timestamp based means they are generated
in ascending order, which means you append on the table ( at the end )
and on the index, so they end up better filled and perform better.
Specially in the index, ordered insertions tend to be well optimized,
being a common case, and perform quite well. It's not going to be as
advantegous as on a clustered table system, but it will help ( but not
that much ).

> But, doing a quick check, it appears Postgres does indeed have a mechanism
> for a clustered index:
> https://www.postgresql.org/docs/9.5/static/sql-cluster.html
> So, does anyone know which is best? Or are the performance differences so
> minute they'd only matter in extremely unique circumstances?

Cluster just resorts the table and rebuild the index, it's already
being told in other messages. It's equivalent to create temp table tt
as select * from the_table, truncate th_table, insert into the_table
select * from tt order by index_expression, drop table tt. It is nice
to do it for tables that are normally ordered but somehow lost it.
Like having a log table with an indexed field for insertion timestamp
and updating it a lot, or purging many old records. As you normally
would typically query it with a range condition on the timestamp, a
cluster will help.

Francisco Olarte.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general