Re: Using multiple cores for index creation?

Scott Carey <scott@xxxxxxxxxxxxxxxxx> · Fri, 30 Jan 2009 12:04:05 -0800

Title: Re:  Using multiple cores for index creation?

It may not be that possible with your schema, but when I had to restore a 2.5TB database (with plenty fast I/O, it was never I/O bound) I used 3 or 4 copies of pg_restore operating on different tables.

With the –Fc option, like a plain dump you can have it restore just certain schemas or tables per command.  A bit of manual work but you don’t have to modify the dump file(s), just make various differing commands that operate on different sections of the database.  How hard that is depend on the schema.  In my case, we have most of the data in partitioned tables and can use a regex to peel off chunks of them by date to restore in different processes once the parent tables and schema are in place.

It still took all day though, and it wasn’t I/O bound.

On 1/29/09 12:21 PM, "henk de wit" <henk53602@xxxxxxxxxxx> wrote:

Hi,

When I try to restore a database dump on PostgreSQL 8.3 that's approximately 130GB in size and takes about 1 hour, I noticed index creation makes up the bulk of that time. I'm using a very fast I/O subsystem (16 Mtron Pro 7535 SSDs using a dual 1.2Ghz IOP/4GB cache RAID controller), fast CPUs (2 quad core C2Q's at 2.6Ghz) and 32GB RAM. From monitoring the restore process, I learned that only 10 minutes is spend doing IO, while the rest of the time is spend on creating the indexes. Index creation seems to be completely CPU bound.

The problem is that only 1 CPU core is used. My other 7 cores are just sitting there doing nothing. It seems to me that creating each index, especially for different tables, is something that can be done independently.

Is there some way I can let PostgreSQL use multiple cores for creating the indexes?

Thanks in advance

Express yourself instantly with MSN Messenger! MSN Messenger <http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/>