Re: Load experimentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andy,

Load is chugging along. We've optimized our postgres conf as much as possible but are seeing the inevitable I/O bottleneck. I had the same thought as you (converting inserts into copy's) a while back but unfortunately each file has many inserts into many different tables. Potentially I could rip through this with a little MapReduce job on 50-100 nodes, which is still something I might do.

One thought we are playing with was taking advantage of 4 x 414GB EBS devices in a RAID0 configuration. This would spread disk writes across 4 block devices.

Right now I'm wrapping about 1500 inserts in a transaction block. Since its an I/O bottlenecks, COPY statements might not give me much advantage.

Its definitely a work in progress :)

Ben


On 09/12/2009 5:31 AM, Andy Colson wrote:
On 12/07/2009 12:12 PM, Ben Brehmer wrote:
Hello All,

I'm in the process of loading a massive amount of data (500 GB). After
some initial timings, I'm looking at 260 hours to load the entire 500GB.
10 days seems like an awfully long time so I'm searching for ways to
speed this up. The load is happening in the Amazon cloud (EC2), on a
m1.large instance:
-7.5 GB memory
-4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
-64-bit platform


So far I have modified my postgresql.conf file (PostgreSQL 8.1.3). The
modifications I have made are as follows:

shared_buffers = 786432
work_mem = 10240
maintenance_work_mem = 6291456
max_fsm_pages = 3000000
wal_buffers = 2048
checkpoint_segments = 200
checkpoint_timeout = 300
checkpoint_warning = 30
autovacuum = off


There are a variety of instance types available in the Amazon cloud
(http://aws.amazon.com/ec2/instance-types/), including high memory and
high CPU. High memory instance types come with 34GB or 68GB of memory.
High CPU instance types have a lot less memory (7GB max) but up to 8
virtual cores. I am more than willing to change to any of the other
instance types.

Also, there is nothing else happening on the loading server. It is
completely dedicated to the load.

Any advice would be greatly appreciated.

Thanks,

Ben


I'm kind of curious, how goes the load? Is it done yet? Still looking at days'n'days to finish?

I was thinking... If the .sql files are really nicely formatted, it would not be too hard to whip up a perl script to run as a filter to change the statements into copy's.

Each file would have to only fill one table, and only contain inserts, and all the insert statements would have to set the same fields. (And I'm sure there could be other problems).

Also, just for the load, did you disable fsync?

-Andy


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux