Re: pg_xlog becomes extremely large during CREATE INDEX

Tom Lane <tgl@sss.pgh.pa.us> · Sat, 15 May 2004 00:23:18 -0400

I wrote:
> I tried this locally, and what I see happening is that a checkpoint
> process starts shortly after the CREATE INDEX begins whomping out the
> index data --- but it doesn't finish until after the CREATE INDEX does.
> AFAICS there is not any sort of locking problem,

I have to take that back: there definitely is a locking problem.
Perhaps there is an I/O bandwidth issue too.

What I see happening on closer analysis is that btree CREATE INDEX can
hold "exclusive context lock" on some shared buffers for significant
periods of time.  It tries to write all the levels of the btree in
parallel, so it is spitting out level-zero pages at a great rate,
level-one pages at a lesser rate, etc.  For a large index there could
be many btree levels, and pages in the higher levels will be held locked
in the shared buffer arena for considerable periods.

CHECKPOINT scans the shared buffer arena and tries to write every dirty
page it finds.  This requires getting shared context lock, and so will
be blocked by the lock CREATE INDEX is holding.

I am toying with the idea that CREATE INDEX shouldn't use the shared
buffer manager at all; there is no need for other backends to touch the
index until the creating transaction commits.  We'd need to be able to
fsync the index file before committing.  That isn't part of the smgr API
right now, but could easily be added.

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend