Re: plsql gets "out of memory"

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> · Mon, 29 Aug 2011 10:18:57 -0500

Rural Hunter <ruralhunter@xxxxxxxxx> wrote:

> yes, the size of my database(around 200g) is very close to yours.
> How long did it take you to migrate the data(100-200G)?

I don't remember off-hand, and I'm not sure it's worth digging for
the info because so much has changed in the intervening six years.
Recent PostgreSQL releases are much faster, and our current hardware
is also much faster.  I just ran bonnie++ on our new server just
arrived (yet to be put into production) and get 619930 KB/sec for
sequential writes.  200 MB at that rate is 338 seconds.  :-)

> Another problem is, from my test, the performance of bulk insert
> looks not very good. I estimated the importing of 30G data would
> take about 10-20 hours based on the progress. I already did some
> tweaks such as:
> fsync=off
> archive_mode=off

Good (but don't forget to change that once the bulk load is done).
You should probably also turn off full_page_writes and
synchronous_commit.  I've seen benchmarks which show that this
helps, even with the other settings you mention.

> Increase checkpoint_segments

There can be paradoxical results with that.  For reasons yet to be
determined, bulk conversion (unlike most workloads) sometimes runs
faster with a small setting like the default.  You have to test to
see how it works in your environment.

> drop indexes and primary keys

Yeah, you definitely want to build those only after the data for a
table is loaded.  I also recommend a VACUUM FREEZE ANALYZE on the
database unless most of these rows will be deleted or updated before
you run a billion database transactions.  Otherwise you will get a
painful "anti-wraparound" autovacuum on everything, probably at a
time of heavy usage.

> I have about 5G memory free on the server and have these memory
> settings:
> shared_buffers = 1GB
> work_mem = 8MB
> maintenance_work_mem = 1GB
> effective_cache_size = 4GB

Reasonable for that amount of RAM, but that's seems like
underpowered hardware for the size of the database.  If the
production conversion is going to be run against different hardware,
these tests may not be giving you very good numbers for what to
expect.

> Not sure if there is any room to increase the performance of bulk 
> insert. I monitored the server when the import running. The
> utilization of CPU/disk is very low.

We tended to see low disk utilization, and saturated CPUs.  We
worked around this by using a small queue of rows in the Java
conversion process, with one thread reading and a separate thread
writing.  We also run several conversion processes at once, on
separate tables.  See about using COPY, either through JDBC support
or outputting your files in a format COPY can digest.  It can be
significantly faster than INSERT, even if you optimize the INSERTs
by using prepared statements and a large number of rows per
transaction.  Also, if at all possible, load the data within the
same database transaction where the table is created.  This can
allow for an optimization where the data is not logged to WAL
because if the transaction aborts, the table is not there anyway.

> The memory usage seems no much change when pgsql is up or down. So
> looks the performance bottleneck is not on cpu/disk/memory. I'm a
> bit lost on this and have no idea what to check/change.

Network latency?  Avoid encrypted connections for the load, and do
whatever you can to minimize latency, like attaching both machines
to the same switch.  You can't improve performance much if you're
working on the things which are only using a small part of the time.
Identifying the source of your delays is the most important thing
at this point.

-Kevin

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin