Re: Improve COPY performance for large data sets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



A single SATA drive may not be the best performer, but:

1. It won't make a load take 12 hours unless we're talking a load that is in total, similar to the size of the disk.  A slow, newer SATA drive will read and write at at ~50MB/sec at minimum, so the whole 320GB can be scanned at 3GB per minute.  Thats ~ 5 hours.  It is not likely that 20M records is over 20GB, and at that size there is no way the disk is the bottleneck.

2. To figure out if the disk or CPU is a bottleneck, don't assume.  Check iostat or top and look at the disk utilization % and io wait times.  Check the backend process CPU utilization.  In my experience, there are many things that can cause COPY to be completely CPU bound even with slow disks -- I have seen it bound to a 5MB/sec write rate on a 3Ghz CPU, which a drive from 1998 could handle.

It seems like this case is resolved, but there are some other good tuning recommendations.  Don't blame the disk until the disk is actually showing high utilization though. 

COPY is bound typically by the disk or a single CPU.  It is usually CPU bound if there are indexes or constraints on the table, and sometimes even when there are none.

The pg_bulkload tool in almost all cases, will be significantly faster but it has limitations that make it inappropriate for some to use.



On Wed, Sep 10, 2008 at 10:14 AM, Alan Hodgson <ahodgson@xxxxxxxxx> wrote:
On Wednesday 10 September 2008, Ryan Hansen <ryan.hansen@xxxxxxxxxxxxxxxxxx>
wrote:
>Currently it's taking about 12 hours to complete on a 64 bit
> server with 3 GB memory allocated (shared_buffer), single SATA 320 GB
> drive.  I don't seem to get any improvement running the same operation
> on a dual opteron dual-core, 16 GB server.
>
> I'm not asking for someone to solve my problem, just some direction in
> the best ways to tune for faster bulk loading, since this will be a
> fairly regular operation for our application (assuming it can work this
> way).  I've toyed with the maintenance_work_mem and some of the other
> params, but it's still way slower than it seems like it should be.
> So any contributions are much appreciated.

Your drive subsystem, such as it is, is inappropriate for a database. Your
bottleneck is your drive.

Turning fsync off might help. You should also drop all indexes on the table
before the COPY and add them back after (which would eliminate a lot of
random I/O during the COPY).

--
Alan

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux