Rural Hunter <ruralhunter@xxxxxxxxx> wrote: > yes, the size of my database(around 200g) is very close to yours. > How long did it take you to migrate the data(100-200G)? I don't remember off-hand, and I'm not sure it's worth digging for the info because so much has changed in the intervening six years. Recent PostgreSQL releases are much faster, and our current hardware is also much faster. I just ran bonnie++ on our new server just arrived (yet to be put into production) and get 619930 KB/sec for sequential writes. 200 MB at that rate is 338 seconds. :-) > Another problem is, from my test, the performance of bulk insert > looks not very good. I estimated the importing of 30G data would > take about 10-20 hours based on the progress. I already did some > tweaks such as: > fsync=off > archive_mode=off Good (but don't forget to change that once the bulk load is done). You should probably also turn off full_page_writes and synchronous_commit. I've seen benchmarks which show that this helps, even with the other settings you mention. > Increase checkpoint_segments There can be paradoxical results with that. For reasons yet to be determined, bulk conversion (unlike most workloads) sometimes runs faster with a small setting like the default. You have to test to see how it works in your environment. > drop indexes and primary keys Yeah, you definitely want to build those only after the data for a table is loaded. I also recommend a VACUUM FREEZE ANALYZE on the database unless most of these rows will be deleted or updated before you run a billion database transactions. Otherwise you will get a painful "anti-wraparound" autovacuum on everything, probably at a time of heavy usage. > I have about 5G memory free on the server and have these memory > settings: > shared_buffers = 1GB > work_mem = 8MB > maintenance_work_mem = 1GB > effective_cache_size = 4GB Reasonable for that amount of RAM, but that's seems like underpowered hardware for the size of the database. If the production conversion is going to be run against different hardware, these tests may not be giving you very good numbers for what to expect. > Not sure if there is any room to increase the performance of bulk > insert. I monitored the server when the import running. The > utilization of CPU/disk is very low. We tended to see low disk utilization, and saturated CPUs. We worked around this by using a small queue of rows in the Java conversion process, with one thread reading and a separate thread writing. We also run several conversion processes at once, on separate tables. See about using COPY, either through JDBC support or outputting your files in a format COPY can digest. It can be significantly faster than INSERT, even if you optimize the INSERTs by using prepared statements and a large number of rows per transaction. Also, if at all possible, load the data within the same database transaction where the table is created. This can allow for an optimization where the data is not logged to WAL because if the transaction aborts, the table is not there anyway. > The memory usage seems no much change when pgsql is up or down. So > looks the performance bottleneck is not on cpu/disk/memory. I'm a > bit lost on this and have no idea what to check/change. Network latency? Avoid encrypted connections for the load, and do whatever you can to minimize latency, like attaching both machines to the same switch. You can't improve performance much if you're working on the things which are only using a small part of the time. Identifying the source of your delays is the most important thing at this point. -Kevin -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin