Re: Joining tables by UUID field - very slow

Merlin Moncure <mmoncure@xxxxxxxxx> · Mon, 27 Feb 2012 08:53:15 -0600

On Fri, Feb 24, 2012 at 4:46 PM, Cherio <cherio@xxxxxxxxx> wrote:
> We are experiencing an unusual slowdown when using UUID field in JOIN when
> updating a table. SQL looks like this:
>
> UPDATE dst
> SET data_field = src.data_field
> FROM src
> WHERE dst.uuid_field = src.uuid_field;
>
> This statement takes over 6 times longer than a similar statement against
> the same table except the join is done by a integer field, e.g.
>
> UPDATE dst
> SET data_field = src.data_field
> FROM src
> WHERE dst.integer_field = src.integer_field;
>
> I can't get rid of UUID in the "src" table since it comes from another
> database that we can't change. The table has around 1 mil rows. I tried
> vacuuming it. Tried creating indexes on src table (it ignores them and
> builds hash join anyway). It takes faster to rebuild the whole table than to
> update it while joining by UUID. Has anyone experienced this before and what
> was the solution for you?

If you're updating every field in the table, you're basically
rebuilding the whole table anyways.  Also, both the heap and the
indexes have to track both row versions.  HOT helps for non indexed
field updates, but the HOT optimization tends to only really shine
when the updates are small and frequent.  In postgres it's good to try
and avoid large updates when reasonable to do so.

The UUID is slower because it adds lots of bytes to both the heap and
the index although 6 times slower does seem like a lot.   Can you
simulate a similar update with a text column to see if the performance
differences is related to row/key size?

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance