On Wed, 2008-02-20 at 14:31 +0530, Pavan Deolasee wrote: > I think it would be interesting if we can build these indexes in parallel. > Each index build requires a seq scan on the table. If the table does > not fit in shared buffers, each index build would most likely result > in lots of IO. He's already said that his I/O usage was not the problem. For one thing, he has 8GB of memory for a 5GB dataset. Even when the table is much larger than memory, what percentage of the time is spent on the table scan? A table scan is O(N), whereas an index build is O(N logN). If you combine that with expensive comparisons, e.g. for localized text, then I would guess that the index building itself was much more expensive than the scans themselves. However, building indexes in parallel would allow better CPU utilization. > One option would be to add this facility to the backend so that multiple > indexes can be built with a single seq scan of the table. In theory, it > should be possible, but might be tricky given the way index build works > (it calls respective ambuild method to build the index which internally > does the seq scan). I don't think that this would be necessary, because (as you say below) the synchronized scan facility should already handle this. > Other option is to make pg_restore multi-threaded/processed. The > synchronized_scans facility would then synchronize the multiple heap > scans. ISTM that if we can make pg_restore mult-processed, then > we can possibly add more parallelism to the restore process. I like this approach more. I think that pg_restore is the right place to do this, if we can make the options reasonably simple enough to use. See: http://archives.postgresql.org/pgsql-hackers/2008-02/msg00699.php Regards, Jeff Davis ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your message can get through to the mailing list cleanly