Re: temporary tables, indexes, and query plans

Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> · Sat, 13 Nov 2010 10:14:54 -0600

On Sat, Nov 13, 2010 at 9:41 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> writes:
>> OK. This is a highly distilled example that shows the behavior.
>
>> BEGIN;
>> CREATE TEMPORARY TABLE foo AS SELECT x AS A, chr(x % 75 + 32) AS b,
>> ''::text AS c from generate_series(1,500) AS x;
>> UPDATE foo SET c = 'foo' WHERE b = 'A' ;
>> CREATE INDEX foo_b_idx on foo (b);
>> [ and the rest of the transaction can't use that index ]
>
> OK, this is an artifact of the "HOT update" optimization. ÂBefore
> creating the index, you did updates on the table that would have been
> executed differently if the index had existed. ÂWhen the index does get
> created, its entries for those updates are incomplete, so the index
> can't be used in transactions that could in principle see the unmodified
> rows.

Aha!  When you indicated that HOT updates were part of the problem, I
googled HOT updates for more detail and ran across this article:
http://pgsql.tapoueh.org/site/html/misc/hot.html
which was very useful in helping me to understand things.

If I understand things correctly, after a tuple undergoes a HOT-style
update, there is a chain from the original tuple to the updated tuple.
If an index already exists on the relation (and involves the updated
column), a *new entry* in the index is created.  However, if an index
does not already exist and one is created (which involves a column
with tuples that underwent HOT update) then it seems as though the
index doesn't see either version. Is that description inaccurate?

What would the effect be of patching postgresql to allow indexes to
see and follow the HOT chains during index creation?

The reason I did the update before the index creation is that the
initial update (in the actual version, not this test version) updates
2.8 million of some 7.5 million rows (or a bit under 40% of the entire
table), and such a large update seems like it would have a deleterious
effect on the index (although in either case the planner properly
chooses a sequential scan for this update).

> You could avoid this effect either by creating the index before you do
> any updates on the table, or by not wrapping the entire process into a
> single transaction.

I need the whole thing in a single transaction because I make
/extensive/ use of temporary tables and many dozens of statements that
need to either succeed or fail as one.

Is this "HOT update" optimization interaction with indexes documented
anywhere? It doesn't appear to be common knowledge as there are now 20
messages in this topic and this is the first mention of the HOT
updates / index interaction. I would like to suggest that an update to
the CREATE INDEX documentation might contain some caveats about
creating indexes in transactions on relations that might have HOT
updates.

Again, I'd like to thank everybody for helping me to figure this out.
It's not a huge burden to create the index before the updates, but
understanding *why* it wasn't working (even if it violates the
principle-of-least-surprise) helps quite a bit.

-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance