On Sat, Nov 13, 2010 at 9:41 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> writes: >> OK. This is a highly distilled example that shows the behavior. > >> BEGIN; >> CREATE TEMPORARY TABLE foo AS SELECT x AS A, chr(x % 75 + 32) AS b, >> ''::text AS c from generate_series(1,500) AS x; >> UPDATE foo SET c = 'foo' WHERE b = 'A' ; >> CREATE INDEX foo_b_idx on foo (b); >> [ and the rest of the transaction can't use that index ] > > OK, this is an artifact of the "HOT update" optimization. ÂBefore > creating the index, you did updates on the table that would have been > executed differently if the index had existed. ÂWhen the index does get > created, its entries for those updates are incomplete, so the index > can't be used in transactions that could in principle see the unmodified > rows. Aha! When you indicated that HOT updates were part of the problem, I googled HOT updates for more detail and ran across this article: http://pgsql.tapoueh.org/site/html/misc/hot.html which was very useful in helping me to understand things. If I understand things correctly, after a tuple undergoes a HOT-style update, there is a chain from the original tuple to the updated tuple. If an index already exists on the relation (and involves the updated column), a *new entry* in the index is created. However, if an index does not already exist and one is created (which involves a column with tuples that underwent HOT update) then it seems as though the index doesn't see either version. Is that description inaccurate? What would the effect be of patching postgresql to allow indexes to see and follow the HOT chains during index creation? The reason I did the update before the index creation is that the initial update (in the actual version, not this test version) updates 2.8 million of some 7.5 million rows (or a bit under 40% of the entire table), and such a large update seems like it would have a deleterious effect on the index (although in either case the planner properly chooses a sequential scan for this update). > You could avoid this effect either by creating the index before you do > any updates on the table, or by not wrapping the entire process into a > single transaction. I need the whole thing in a single transaction because I make /extensive/ use of temporary tables and many dozens of statements that need to either succeed or fail as one. Is this "HOT update" optimization interaction with indexes documented anywhere? It doesn't appear to be common knowledge as there are now 20 messages in this topic and this is the first mention of the HOT updates / index interaction. I would like to suggest that an update to the CREATE INDEX documentation might contain some caveats about creating indexes in transactions on relations that might have HOT updates. Again, I'd like to thank everybody for helping me to figure this out. It's not a huge burden to create the index before the updates, but understanding *why* it wasn't working (even if it violates the principle-of-least-surprise) helps quite a bit. -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance