Re: Why does CREATE INDEX CONCURRENTLY need two scans?

Joshua Ma <josh@xxxxxxxxxxxxx> · Tue, 31 Mar 2015 21:06:27 -0700

Ah, that's exactly what I was looking for. Thanks everyone for the responses!
- Josh
ᐧ

On Tue, Mar 31, 2015 at 8:54 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Michael Paquier <michael.paquier@xxxxxxxxx> writes:

> On Wed, Apr 1, 2015 at 9:43 AM, Joshua Ma <josh@xxxxxxxxxxxxx> wrote:

>> Why are two scans necessary? What would break if it did something like the

>> following?

>>

>> 1) insert pg_index entry, wait for relevant txns to finish, mark index

>> open for inserts

>>

>> 2) build index in a single snapshot, mark index valid for searches

>> Wouldn't new inserts update the index correctly? Between the snapshot and

>> index-updating txns afterwards, wouldn't all updates be covered?

> When an index is built with index_build, are included in the index only the

> tuples seen at the start of the first scan. A second scan is needed to add

> in the index entries for the tuples that have been inserted into the table

> during the build phase.

More to the point: Joshua's design supposes that retail insertions into

an index can happen in parallel with index build.  Or in other words,

that index build consists of instantaneously creating an empty-but-valid

index file and then doing a lot of ordinary inserts into it.  That's a

possible design, but it's not very efficient, and most of our index AMs

don't do it that way.  btree, for instance, starts by sorting all the

entries and creating the leaf-level pages.  Then it builds the upper tree

levels.  It doesn't have a complete tree that could support retail

insertions until the very end.  Moreover, most of the work is done in

storage that's local to the backend running CREATE INDEX, and isn't

accessible to other processes at all.

                        regards, tom lane