Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Sat, May 15, 2004 at 12:23:18AM -0400, Tom Lane wrote: >> ... It tries to write all the levels of the btree in >> parallel, so it is spitting out level-zero pages at a great rate, >> level-one pages at a lesser rate, etc. For a large index there could >> be many btree levels, and pages in the higher levels will be held locked >> in the shared buffer arena for considerable periods. > I'm not sure how btree from-scratch-building work, but could it be made > so that it first build level 0 completely, scanning the heap; then build > level 1 scanning the level 0, and so on? I don't think that would be a win; it would require an extra read pass over nearly all of the index. The idea here is to reduce the amount of I/O, not increase it. > I also wonder why index creation would write XLog information; Yeah, that was implicit in my comment about not using the buffer manager, and it was why I was concerned about being able to fsync the index file. Without xlog for backup, you'd better be able to force the index to disk before you commit. It turns out that btree index build currently writes each page three times: once to extend the file, once into WAL, and then finally to write the completed index page. It is difficult to do any better as long as you're using the shared buffer manager. (When extending a file, the buffer manager wants to allocate disk space before it will give you a buffer to write into. This is the right thing for heap insertions and extending an existing index, because we don't force buffers to disk before commit. So if it didn't work that way, you might commit your transaction before finding out there's no disk space to hold the data. But we could dispense with that worry for index build if we ensure that we write and fsync all the pages before commit.) So I was thinking about keeping the same index build process, but working in private storage instead of shared buffers, and writing direct to smgr. This would not only eliminate the locking conflict against checkpoints, but reduce the amount of I/O involved quite a lot. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)